Apache Ambari Metrics Collector issues in Azure HDInsight

This article describes troubleshooting steps and possible resolutions for issues when interacting with Azure HDInsight clusters.

Scenario: OutOfMemoryError or Unresponsive Apache Ambari Metrics Collector

Background

The Ambari Metrics Collector is a daemon that runs on a specific host in the cluster and receives data from the registered publishers, Monitors, and Sinks.

Issue

  • You could receive a disquisitional "Metrics Collector Process" alarm in Ambari UI and show below similar message. Connection failed: timed out to <headnode fqdn>:6188
  • Ambari Metrics Collector maybe getting restarted in the headnode often
  • Some Apache Ambari metrics may not prove up in Ambari UI or Grafana. For example, NAMENODE shows Started instead of Active/Standby status. The 'No Data Bachelor' bulletin might appear in Ambari Dashboard

Cause

The following scenarios are possible causes of these bug:

An out of retention exception happens frequently

Cheque the Apache Ambari Metrics Collector log /var/log/ambari-metrics-collector/ambari-metrics-collector.log*.

              nineteen:59:45,457 Fault [325874797@qtp-2095573052-22] log:87 - handle failed java.lang.OutOfMemoryError: Java heap infinite  nineteen:59:45,457 FATAL [MdsLoggerSenderThread] YarnUncaughtExceptionHandler:51 - Thread Thread[MdsLoggerSenderThread,5,master] threw an Mistake.  Shutting downwardly now... java.lang.OutOfMemoryError: Coffee heap infinite                          

Busy garbage drove

  1. Apache Ambari Metrics Collector is non listening on 6188 in hbase-ams log /var/log/ambari-metrics-collector/hbase-ams-master-hn*.log

                      2021-04-13 05:57:37,546 INFO  [timeline] timeline.HadoopTimelineMetricsSink: No live collector to send metrics to. Metrics to be sent will exist discarded. This message will be skipped for the next twenty times.                                  
  2. Get the Apache Ambari Metrics Collector pid and check GC performance

                      ps -fu ams | grep 'org.apache.ambari.metrics.AMSApplicationServer'                                  
  3. Cheque the garbage collection status using jstat -gcutil <pid> 1000 100. If you come across the FGCT increase a lot in short seconds, information technology indicates Apache Ambari Metrics Collector is busy in Full GC and unable to process the other requests.

Resolution

To avoid these bug, consider using i of the following options:

  1. Increment the heap retention of Apache Ambari Metrics Collector from Ambari > Ambari Metrics > CONFIGS > Advanced ams-env > Metrics Collector Heap Size

    Screenshot of editing Ambari Metric Service configuration properties.

  2. Follow these steps to clean up Ambari Metrics service (AMS) data.

    Note

    Cleaning up the AMS data removes all the historical AMS data available. If you need the history, this may non be the all-time selection.

    1. Login into the Ambari portal
    2. Set AMS to maintenance
    3. Finish AMS from Ambari
    4. Identify the following from the AMS Configs screen
      1. hbase.rootdir (Default value is file:///mnt/information/ambari-metrics-collector/hbase)
      2. hbase.tmp.dir(Default value is /var/lib/ambari-metrics-collector/hbase-tmp)
    5. SSH into headnode where Apache Ambari Metrics Collector exists. As superuser:
    6. Remove the AMS zookeeper information by bankroll upward and removing the contents of 'hbase.tmp.dir'/zookeeper
    7. Remove any Phoenix spool files from <hbase.tmp.dir>/phoenix-spool binder
    8. (It is worthwhile to skip this stride initially and endeavor restarting AMS to see if the result is resolved. If AMS is still failing to come up up, effort this step)
      AMS data would be stored in hbase.rootdir identified to a higher place. Apply regular Os commands to back up and remove the files. Example: tar czf /mnt/backupof-ambari-metrics-collector-hbase-$(date +%Y%grand%d-%H%M%S).tar.gz /mnt/information/ambari-metrics-collector/hbase
    9. Restart AMS using Ambari.

For Kafka cluster, if the in a higher place solutions practise not help, consider the following solutions.

  • Ambari Metrics Service needs to deal with lots of kafka metrics, so it's a good idea to enable just metrics in the allowlist. Go to Ambari > Ambari Metrics > CONFIGS > Advanced ams-env, set below property to truthful. After this modification, need to restart the impacted services in Ambari UI every bit required.

    Screenshot of editing Ambari Metric Service allowlisted metrics properties.

  • Handling lots of metrics for standalone HBase with limited memory would impact HBase response time. Hence metrics would be unavailable. If Kafka cluster has many topics and withal generates a lot of allowed metrics, increase the heap retentiveness for HMaster and RegionServer in Ambari Metrics Service. Go to Ambari > Ambari Metrics > CONFIGS > Avant-garde hbase-env > HBase Master Maximum Retentiveness and HBase RegionServer Maximum Memory and increment the values. Restart the required services in Ambari UI.

    Screenshot of editing Ambari Metric Service hbase memory properties.

Adjacent steps

If you didn't see your problem or are unable to solve your issue, visit 1 of the following channels for more than support:

  • Go answers from Azure experts through Azure Community Support.

  • Connect with @AzureSupport - the official Microsoft Azure account for improving customer experience. Connecting the Azure community to the right resources: answers, support, and experts.

  • If you need more help, you tin submit a back up asking from the Azure portal. Select Support from the menu bar or open up the Assist + support hub. For more detailed data, review How to create an Azure support request. Access to Subscription Management and billing support is included with your Microsoft Azure subscription, and Technical Support is provided through one of the Azure Support Plans.