Close Menu





    Guest Post Buyers

    Shredding Confidential Commercial Documents: Process, Benefits, and Security

    28 February 2026

    The Enduring Allure of Alexander McQueen Shoes

    28 February 2026

    Raspberry Hills Clothing | Gold+Vintage Official Store

    28 February 2026

    Fellowship in Infertility by Medline Academics – Build Your IVF Career with Confidence

    28 February 2026

    Bandar Togel Casino: Understanding the Role of Lottery Operators in Online Gambling

    28 February 2026

    Understanding “Data HK Casino”: Information, Results, and Legal Considerations

    28 February 2026
    Facebook X (Twitter) Instagram
    • Home
    • About
    • Contact us
    • Advertise
    • Privacy Policy
    • Disclaimer
    • Terms & Conditions
    • Sitemap
    • Post Article
    Facebook X (Twitter) Instagram LinkedIn RSS
    Soft2share.comSoft2share.com
    • Tech
      • Internet
      • Computer
      • Apps
      • Gadgets
      • Android
    • Business
      • Marketing
      • Security
      • Management
      • Cryptocurrency
      • Finance
    • Gaming
    • Android
    • Softwares
    • Gadgets
    • Blockchain
    • Ecommerce
    • Digital Marketing
    • AI
    Soft2share.comSoft2share.com
    Home»Tech News»Exposing Hadoop Metrics
    Tech News

    Exposing Hadoop Metrics

    Soft2share.comBy Soft2share.com17 September 20198 Mins Read
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email
    B2B Leads Database

    Introduction:

    Bigdata, a fast-growing technology used for handling large sets of data to analyse the trend about the product or to capture the useful information. Hadoop is a framework which plays a major role for handling the huge data files from Tera bytes to Peta bytes by using distributed computing.

    Hadoop ecosystem tools includes many services such as HDFS, Hive, Oozie, Solr, etc., each tool plays a significant role for collection and processing of tons of data from various sources.

    There is a challenge to face the change in managing Hadoop services all together in a cluster. For instance, Hive is one of the services in Hadoop which uses MapReduce, which is disk intensive process, consume more system to process peta bytes of data.

    Handling multiple workloads with MapReduce and solr indexing are expensive operation to manage. It is important to gather the system level metrics to understand and analyse the resource utilization trend of the hadoop services.

    Distributors like Cloudera and Hortonworks provide better UI to capture the system Metrics for each service individually. Over a period of time, usage might vary depends on data size that would force to tune the service configurations without the previous knowledge, as CDH and Hortonworks are limited to store the captured data in database like PostgreSQL and MySQL.

    One would require more Metrics to analyse/fine-tune the system. Hence, we are going for an external metric comparing system by exposing Metrics to influx/graphite and analyse by Grafana.

    In this blog, you would would see,

    • how to measure the Hadoop Metrics using Hadoop Metrics2 framework,
    • how to configure CDH with Graphite and
    • how to configure graphite with InfluxDB for all sql users,
    • how to configure and monitor the InfluxDB using grafana for graphical representation.

    Basically, Hadoop Metrics are collection of information, events and measurements which will help us to tune performance wise in production cluster.

    Hadoop Metrics2 and Architecture

    The Hadoop Metrics2 framework is playing three major roles as shown in the below diagram.

    • Metrics Sources is to generate the Metrics,
    • Sink Metrics consume the Metrics which is generated by source Metrics
    • Metrics System is used to regularly poll the metric sources and to pass the Metrics record to skin.

    The main advantage of Hadoop Metrics2 Framework is, provision of multiple Metrics output plugins used it in parallel. It allows dynamic reconfiguration of Metrics plugins without having to restart the server, provides Metrics filtering, and allows all Metrics to be exported via Java Management Extensions (JMX).

    How to Expose Hadoop Metrics

    The Hadoop core services and Hbase supports writing their metrics into Graphite. The complete process picture is given below.

    Process Diagram

    Role of Graphite?

    In this case, Graphite alone is used for collecting the Metrics from hadoop eco-system via config with cloudera manager using  org.apache.hadoop.Metrics2.sink.GraphiteSink and that will export their metrics every 10sec (Configurable based on interval) and the actual data will be stored in InfluxDB.

    What is InfluxDB?

    InfluxDB is open source time series database and it is built on NoSQL flavours that allows for quick database schema modifications. InfluxDB is mostly used with Grafana for graphical representation.

    Why should we prefer InfluxDB?

    The preferential options for choosing InfluxDB has two reasons. One, the familiarity, most people are acquainted in sql queries in the organization and second, instead of using multiple applications, it can be used in single handy application.

    It is also easy for supporting team to monitor the production cluster. InfluxQL also supports regular expressions, arithmetic expressions, and time series specific functions to speed up data processing.

    It provides various plugins like UDP plugin, Graphite plugin, OpenTSDB and CollectD plugin etc. Every input plugin (CollectD, Graphite, JSON, Logfmt, CSV etc) contains the data_format option that can be used to select the desired parser. For example,

    What is Grafana?

    Grafana is an open source and analytics & visualization suite. It is mainly used for visualizing time series data for infrastructure and application analytics and it will not allow to store time series data. In this case, we are using Grafana for graphical representation.

    How to enable Graphite in InfluxDB?

    After Installing InfluxDB, enable the graphite with the following steps in linux environment.

    Step 1: Go to etc directory and open influxdb.conf

    cd /etc/influxdb

       vi influxdb.conf

    Step 2: To verify Influxdb meta directory

    Step 3: To verify influxdb data directory.

    Step 4: To enable the graphite

    [[graphite]]

               enabled = true

               bind-address = “:2003” // user define

               protocol = “tcp”

    Step 5: To bind the influxdb and root directory

    [routes.influxdb]

     patt = “”

     addr = “influxdb:2003”

     spool = true  pickle = false

    Once the configuration is done, the above configuration shall have to be verified whether its success or fail by using bellow steps.

    Hadoop Metrics2 Graphite Configuration in Cloudera Manager:

    The below steps to be followed to configure the Hadoop Metrics2 Graphite.

    Step 1: CDH(Cloudera)home page -> Configuration – > Advanced Configuration Snippets – > Search “Metrics”  now it will shows (See Image 3).


    Step 2: Configure HDFS, YARN, Hbase, using by Hadoop Metrics2 Advanced Configuration Snippet (Safety Valve). See the sample for HBase Master and Region server

    *.period=10

    *.sink.graphite.class=org.apache.hadoop.Metrics2.sink.GraphiteSink

    hbase.sink.graphite.server_host=10.150.193.28

    hbase.sink.graphite.server_port=2003 hbase.sink.graphite.Metrics_prefix=hbase

    Step 3: Similarly, Configure the HDFS(DataNoad, NameNode, SecondaryNameNode) and YARN(NodeManager, ResourceManager, JobHistory Server).

    InfluxDB Grafana Configuration:

    Step 1: Before configuration, install the Grafana and then type in the web browser to connect Grafana using “http://localhost:3000 ”. It is applicable for using the hostname or IP of your Ubuntu server and port 3000. Log in with admin/admin User name and Password

    Step 2 : After logging in, click on Data Sources in left menu, and then on Add New in the top menu to add a new datasource.

    Step 3:  Choose the following options and click Add.

                Name             : statesdemo

                Type               : InfluxDB

                Url                   : http://localhost:8086/

                Database       : statesdemo

                User                : admin

                Password      : admin After adding the datasource like above, the test Connection button at the bottom will be popped-up where it can be used to verify if your settings are correct.

    Step 4: Click on the Dashboards link in the left menu, then the Home menu in the top to get a list of dashboards. Click the Create New button at the bottom to create a new dashboard.

    Step 5: To add a graph, just select the graph button in the panel filter. Then, the below dialogue box (New dashboard) board would appear

    Then, to edit the graph click on the panel title and select the Edit button:

    This will open up the graph configuration panel at the bottom of the display

    To select a value or multiple values from statesdemo(DataBaseName) and then click the Add query button. In that, SELECT row there to specify what are the fields and functions going to be used. If you do a grouping “by time”, you would require an aggregation function.

    For example 1, the DataNode Fields in Exposing Hadoop Metrics to Graphite through InfluxDB query is showed below.

    I have select two DataNode fields that is

    1) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlockReportsNumOps 2) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlockChecksumOpAvgTime

    The result/output of the above queries is,

    For example 2, the Hbase JVM Metrics query is showed below. 1) hbase.jvm.JvmMetrics.Context=jvm.ProcessName=Master.SessionId=.Hostname=lcmaster.karix.local.GcCount

    Based on the query output is

    For example 3, Hbase BlocksCached, BlocksRead and BlocksWritten:

    1) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlocksCached

    2) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlocksRead

    3) hadoop.datanode.dfs.datanode.Context=dfs.Hostname=lcslave1.karix.local.BlocksWritten Based on those query output is

    Conclusion:

    The Metrics2 system for Hadoop gives a real-time and historical data that help monitor and debug problems associated with the Hadoop services and jobs.

    It will be helpful to check and provide the comparison statement of issues that occurred in the past, recently and frequently in production cluster thus helps tuning the performance of production cluster accordingly.

    It will also help to solve the numerous issues like G1GC. Note: In our case after resolving the G1GC issue, the performance of production cluster got improved widely.

    B2B Leads Database
    big data
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Soft2share.com
    • Website

    Related Posts

    How to Customize ServiceNow Without Killing Your 2026 Upgrades

    25 February 2026

    The Future of Remote Work and Its Effect on Business Models

    23 February 2026

    Choosing the Perfect Flooring for Your Winnipeg Home

    19 August 2025

    Massachusetts Criminal Court Process: A Complete Overview

    19 August 2025

    The Science of Fat Loss: 3 Proven Methods

    19 August 2025

    Deck Design Tips for Calgary Homes

    19 August 2025
    Leave A Reply

    You must be logged in to post a comment.





    Guest Post Buyers

    Top Posts

    Shredding Confidential Commercial Documents: Process, Benefits, and Security

    The Enduring Allure of Alexander McQueen Shoes

    Raspberry Hills Clothing | Gold+Vintage Official Store

    Fellowship in Infertility by Medline Academics – Build Your IVF Career with Confidence

    Bandar Togel Casino: Understanding the Role of Lottery Operators in Online Gambling

    Understanding “Data HK Casino”: Information, Results, and Legal Considerations

    Educational Toys in Pakistan A Smart Way to Educate Children Beyond Classrooms

    NCV Test Near Me: Understanding the Procedure, Benefits & What to Expect

    Our Picks

    Shredding Confidential Commercial Documents: Process, Benefits, and Security

    28 February 2026

    The Enduring Allure of Alexander McQueen Shoes

    28 February 2026

    Raspberry Hills Clothing | Gold+Vintage Official Store

    28 February 2026
    Popular Posts

    CRM for Real Estate Wholesaler Platforms – 7 Powerful Reviews, Use Cases & ROI Analysis

    20 February 2026

    CorelDraw X7 Serial Number 64/32 Bit Activation Code

    25 January 2021

    Sp5der Hoodies & Outfits Guide for Trendy Streetwear Fans

    18 February 2026
    About
    About

    Soft2share.com is a thriving hub that informs readers about the ever changing and volatile world of technology. It pledges to provide the most up-to-date business ideas, SEO strategies, digital marketing advice, and technological news.

    We're social, connect with us:

    Facebook X (Twitter) Instagram LinkedIn WhatsApp RSS
    • Home
    • About
    • Contact us
    • Advertise
    • Privacy Policy
    • Disclaimer
    • Terms & Conditions
    • Sitemap
    • Post Article
    © 2026 Soft2share.com. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.

    Guest Post Buyers Email List | Advertisers and SEO Agency Contacts | 850 Million B2B Leads Database

    Get Now for $150