Home Tech News Exposing Hadoop Metrics

Exposing Hadoop Metrics

by Soft2share.com

Introduction:

Bigdata, a fast-growing technology used for handling large sets of data to analyse the trend about the product or to capture the useful information. Hadoop is a framework which plays a major role for handling the huge data files from Tera bytes to Peta bytes by using distributed computing.

Hadoop ecosystem tools includes many services such as HDFS, Hive, Oozie, Solr, etc., each tool plays a significant role for collection and processing of tons of data from various sources.

There is a challenge to face the change in managing Hadoop services all together in a cluster. For instance, Hive is one of the services in Hadoop which uses MapReduce, which is disk intensive process, consume more system to process peta bytes of data.

Handling multiple workloads with MapReduce and solr indexing are expensive operation to manage. It is important to gather the system level metrics to understand and analyse the resource utilization trend of the hadoop services.

Distributors like Cloudera and Hortonworks provide better UI to capture the system Metrics for each service individually. Over a period of time, usage might vary depends on data size that would force to tune the service configurations without the previous knowledge, as CDH and Hortonworks are limited to store the captured data in database like PostgreSQL and MySQL.

One would require more Metrics to analyse/fine-tune the system. Hence, we are going for an external metric comparing system by exposing Metrics to influx/graphite and analyse by Grafana.

In this blog, you would would see,

  • how to measure the Hadoop Metrics using Hadoop Metrics2 framework,
  • how to configure CDH with Graphite and
  • how to configure graphite with InfluxDB for all sql users,
  • how to configure and monitor the InfluxDB using grafana for graphical representation.

Basically, Hadoop Metrics are collection of information, events and measurements which will help us to tune performance wise in production cluster.

Hadoop Metrics2 and Architecture

The Hadoop Metrics2 framework is playing three major roles as shown in the below diagram.

  • Metrics Sources is to generate the Metrics,
  • Sink Metrics consume the Metrics which is generated by source Metrics
  • Metrics System is used to regularly poll the metric sources and to pass the Metrics record to skin.

The main advantage of Hadoop Metrics2 Framework is, provision of multiple Metrics output plugins used it in parallel. It allows dynamic reconfiguration of Metrics plugins without having to restart the server, provides Metrics filtering, and allows all Metrics to be exported via Java Management Extensions (JMX).

How to Expose Hadoop Metrics

The Hadoop core services and Hbase supports writing their metrics into Graphite. The complete process picture is given below.

Process Diagram

Role of Graphite?

In this case, Graphite alone is used for collecting the Metrics from hadoop eco-system via config with cloudera manager using  org.apache.hadoop.Metrics2.sink.GraphiteSink and that will export their metrics every 10sec (Configurable based on interval) and the actual data will be stored in InfluxDB.

What is InfluxDB?

InfluxDB is open source time series database and it is built on NoSQL flavours that allows for quick database schema modifications. InfluxDB is mostly used with Grafana for graphical representation.

Why should we prefer InfluxDB?

The preferential options for choosing InfluxDB has two reasons. One, the familiarity, most people are acquainted in sql queries in the organization and second, instead of using multiple applications, it can be used in single handy application.

It is also easy for supporting team to monitor the production cluster. InfluxQL also supports regular expressions, arithmetic expressions, and time series specific functions to speed up data processing.

It provides various plugins like UDP plugin, Graphite plugin, OpenTSDB and CollectD plugin etc. Every input plugin (CollectD, Graphite, JSON, Logfmt, CSV etc) contains the data_format option that can be used to select the desired parser. For example,

What is Grafana?

Grafana is an open source and analytics & visualization suite. It is mainly used for visualizing time series data for infrastructure and application analytics and it will not allow to store time series data. In this case, we are using Grafana for graphical representation.

How to enable Graphite in InfluxDB?

After Installing InfluxDB, enable the graphite with the following steps in linux environment.

Step 1: Go to etc directory and open influxdb.conf

cd /etc/influxdb

   vi influxdb.conf

Step 2: To verify Influxdb meta directory

Step 3: To verify influxdb data directory.

Step 4: To enable the graphite

[[graphite]]

           enabled = true

           bind-address = “:2003” // user define

           protocol = “tcp”

Step 5: To bind the influxdb and root directory

[routes.influxdb]

 patt = “”

 addr = “influxdb:2003”

 spool = true  pickle = false

Once the configuration is done, the above configuration shall have to be verified whether its success or fail by using bellow steps.

Hadoop Metrics2 Graphite Configuration in Cloudera Manager:

The below steps to be followed to configure the Hadoop Metrics2 Graphite.

Step 1: CDH(Cloudera)home page -> Configuration – > Advanced Configuration Snippets – > Search “Metrics”  now it will shows (See Image 3).


Step 2: Configure HDFS, YARN, Hbase, using by Hadoop Metrics2 Advanced Configuration Snippet (Safety Valve). See the sample for HBase Master and Region server

*.period=10

*.sink.graphite.class=org.apache.hadoop.Metrics2.sink.GraphiteSink

hbase.sink.graphite.server_host=10.150.193.28

hbase.sink.graphite.server_port=2003 hbase.sink.graphite.Metrics_prefix=hbase

Step 3: Similarly, Configure the HDFS(DataNoad, NameNode, SecondaryNameNode) and YARN(NodeManager, ResourceManager, JobHistory Server).

InfluxDB Grafana Configuration:

Step 1: Before configuration, install the Grafana and then type in the web browser to connect Grafana using “http://localhost:3000 ”. It is applicable for using the hostname or IP of your Ubuntu server and port 3000. Log in with admin/admin User name and Password

Step 2 : After logging in, click on Data Sources in left menu, and then on Add New in the top menu to add a new datasource.

Step 3:  Choose the following options and click Add.

            Name             : statesdemo

            Type               : InfluxDB

            Url                   : http://localhost:8086/

            Database       : statesdemo

            User                : admin

            Password      : admin After adding the datasource like above, the test Connection button at the bottom will be popped-up where it can be used to verify if your settings are correct.

Step 4: Click on the Dashboards link in the left menu, then the Home menu in the top to get a list of dashboards. Click the Create New button at the bottom to create a new dashboard.

Step 5: To add a graph, just select the graph button in the panel filter. Then, the below dialogue box (New dashboard) board would appear

Then, to edit the graph click on the panel title and select the Edit button:

This will open up the graph configuration panel at the bottom of the display

To select a value or multiple values from statesdemo(DataBaseName) and then click the Add query button. In that, SELECT row there to specify what are the fields and functions going to be used. If you do a grouping “by time”, you would require an aggregation function.

For example 1, the DataNode Fields in Exposing Hadoop Metrics to Graphite through InfluxDB query is showed below.

I have select two DataNode fields that is

1) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlockReportsNumOps 2) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlockChecksumOpAvgTime

The result/output of the above queries is,

For example 2, the Hbase JVM Metrics query is showed below. 1) hbase.jvm.JvmMetrics.Context=jvm.ProcessName=Master.SessionId=.Hostname=lcmaster.karix.local.GcCount

Based on the query output is

For example 3, Hbase BlocksCached, BlocksRead and BlocksWritten:

1) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlocksCached

2) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlocksRead

3) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlocksWritten Based on those query output is

Conclusion:

The Metrics2 system for Hadoop gives a real-time and historical data that help monitor and debug problems associated with the Hadoop services and jobs.

It will be helpful to check and provide the comparison statement of issues that occurred in the past, recently and frequently in production cluster thus helps tuning the performance of production cluster accordingly.

It will also help to solve the numerous issues like G1GC. Note: In our case after resolving the G1GC issue, the performance of production cluster got improved widely.

Related Articles

Leave a Comment