Hadoop is a distributed master-slave type architecture. The architecture is something like one name node, one secondary name node, and eight data nodes.

We were using Hadoop to handle a large amount of streaming data from smartphones for one of the leading telecom companies in India. It was an eight-node cluster with Cloudera CDH 5.3.0.

The project was at a critical stage. We were facing a problem wherein the size of “/pdfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr” and “dncp-block-verification.log. prev” kept increasing to 100 of GBs within hours, which was slowing down machine, leading to data node service outage.

It was an HDFS bug (HDFS-7430). There needed to be help on how it should be resolved. After having a good discussion with Hadoop experts at Emergys, I could solve this issue. I have two options to resolve this.

Option 1

By stopping the data node services and deleting dncp block verification files manually. Implementing this would require continuous monitoring, as log files may increase in size on either data node (even on the same node after deleting it).

Option 2

Although slightly drastic, it was to turn off the block scanner entirely by setting into the HDFS DataNode configuring the key dfs.datanode.scan.period.hours to 0 (default is 504 in hours). The negative effect of this could have been DNs not auto-detecting corrupted block files.

After considering the pros and cons, we went ahead with option 1. After implementation, as expected, the service was up and running. Hopefully, this issue will be resolved in the next version of CDH 5.4.x.

It was a big relief, and I felt proud, as it saved a lot of cluster downtime.

Related Posts

  • Managed-Services

    Resolving Data Management Challenges Faced by Modern Enterprises

    Resolving Data Management Challenges Faced by Modern Enterprises

    Data has become the lifeblood of modern enterprises in [...]

  • Transforming Telecom

    Transforming Telecom: Boosting Customer Engagement and Loyalty to Outshine Competitors

    Transforming Telecom: Boosting Customer Engagement and Loyalty to Outshine Competitors

    Telecom struggles to stay resilient in today’s tough times like [...]

  • Modernization Challenge in Healthcare

    Modernizing Healthcare: The Key Role of Managed Services

    Modernizing Healthcare: The Key Role of Managed Services

    In a competitive healthcare sector, medical care providers need continuous [...]

Emergys Blog

Recent Articles

  • Large Language Models

    Verticalization of Large Language Models (LLMs): Unlocking Specialized Potential with Emergys

    Verticalization of Large Language Models (LLMs): Unlocking Specialized Potential with Emergys

    Large Language models (LLMs) have transformed Natural Language Processing [...]

    Large Language models (LLMs) have transformed Natural Language Processing (NLP); however, their generalist nature can [...]

  • Migrating from Remedyforce to BMC Helix

    Enhance Your IT Service Management: Migrating from Remedyforce to BMC Helix

    Enhance Your IT Service Management: Migrating from Remedyforce to BMC Helix

    In today’s rapidly evolving business landscape, organizations must constantly seek [...]

    In today’s rapidly evolving business landscape, organizations must constantly seek ways to optimize their IT service [...]

  • Service Desk Automation

    Top Candidates for Service Desk Automation

    Top Candidates for Service Desk Automation

    Automation is not new to anyone. It is the foundation [...]

    Automation is not new to anyone. It is the foundation for any enterprise digitization. However, companies [...]