Hadoop is a distributed master-slave type architecture. The architecture is something like one name node, one secondary name node, and eight data nodes.

We were using Hadoop to handle a large amount of streaming data from smartphones for one of the leading telecom companies in India. It was an eight-node cluster with Cloudera CDH 5.3.0.

The project was at a critical stage. We were facing a problem wherein the size of “/pdfs/dn/current/Bp-12345-IpAddress-123456789/dncp-block-verification.log.curr” and “dncp-block-verification.log. prev” kept increasing to 100 of GBs within hours, which was slowing down machine, leading to data node service outage.

It was an HDFS bug (HDFS-7430). There needed to be help on how it should be resolved. After having a good discussion with Hadoop experts at Emergys, I could solve this issue. I have two options to resolve this.

Option 1

By stopping the data node services and deleting dncp block verification files manually. Implementing this would require continuous monitoring, as log files may increase in size on either data node (even on the same node after deleting it).

Option 2

Although slightly drastic, it was to turn off the block scanner entirely by setting into the HDFS DataNode configuring the key dfs.datanode.scan.period.hours to 0 (default is 504 in hours). The negative effect of this could have been DNs not auto-detecting corrupted block files.

After considering the pros and cons, we went ahead with option 1. After implementation, as expected, the service was up and running. Hopefully, this issue will be resolved in the next version of CDH 5.4.x.

It was a big relief, and I felt proud, as it saved a lot of cluster downtime.

Emergys Blog

Recent Articles

  • Service Desk Automation

    Top Candidates for Service Desk Automation

    Top Candidates for Service Desk Automation

    Automation is not new to anyone. It is the foundation [...]

    Automation is not new to anyone. It is the foundation for any enterprise digitization. However, companies [...]

  • Maximizing Customer Engagement with Salesforce

    Maximizing Customer Engagement with Salesforce

    Maximizing Customer Engagement with Salesforce

    Forget about closing deals – in today's business world, customer [...]

    Forget about closing deals – in today's business world, customer engagement is all about building bridges, [...]

  • Bridging the Gap Between Humans and Machines with Generative AI

    Bridging the Gap Between Humans and Machines with Generative AI

    Bridging the Gap Between Humans and Machines with Generative AI

    Nowadays, customers expect quick and thorough help whenever they reach [...]

    Nowadays, customers expect quick and thorough help whenever they reach out, whether it’s to order something, [...]