← All posts
Tagged

Hadoop

13 posts

Data EngineeringBig Data

Upgrading Large Hadoop Cluster

A detailed account of upgrading a large Telco Hadoop cluster from HDP 2.6.4 to 3.1.5, covering practice runs, planning strategies, and lessons from executing the upgrade during COVID remote work.

18 May 2020 · 6 min read
Big DataYARN

Why my Hive Sqoop job is failing

A practical troubleshooting guide for common Hive and Sqoop job failures on YARN, covering memory tuning, CPU allocation, and where to find the right logs.

24 Apr 2015 · 3 min read
ArchitectureData Engineering

Design and Architecture considerations for storing time series data

Key design considerations for storing time series data, including access pattern analysis, windowed storage strategies, and the trade-offs between granularity and performance.

16 June 2014 · 2 min read
Data EngineeringBig Data

Migrating Large Hadoop Cluster

Lessons learned from migrating a large Hadoop cluster over a single weekend -- planning, data migration with distcp, code migration of 300+ Oozie jobs, and HBase migration.

14 June 2014 · 6 min read
Data EngineeringBig Data

Hadoop 2.3 Centralized Cache Feature Comparison to Spark RDD

A comparison of the new HDFS centralized cache management feature in Hadoop 2.3 with Spark RDDs, and why Spark still held the edge for in-memory processing.

28 Feb 2014 · 1 min read
Data EngineeringBig Data

Handle Schema Changes and Evolution in Hadoop

Approaches for handling schema evolution in Hadoop using Avro and ORC file formats, including a practical workflow for managing schema changes with Hive.

30 Mar 2013 · 2 min read
JavaData Engineering

Chain Mapper Example

How to use the ChainMapper class in Hadoop to call multiple mappers in sequence, with a working example and key points about configuration and type compatibility.

16 Feb 2013 · 3 min read
Data EngineeringBig Data

Merging Small Files in Hadoop

The small files problem in Hadoop and five approaches to solve it: HDFSConcat, IdentityMapper/Reducer, FileUtil.copyMerge, Hadoop File Crush, and Hive concatenate.

26 Jan 2013 · 3 min read
Data EngineeringBig Data

Hadoop cluster benchmarks

To do bench marks of Hadoop cluster is an ongoing process as we use it inside the organization. The main thing which we don't know when we buy new cluste...

23 Oct 2012 · 3 min read
Data EngineeringBig Data

Cloudera Hadoop certification now available worldwide 1 May 2012

At last its 1 May 2012 Cloudera has opened certifications through vue to worldwide people Details are as follows from   Developer Exam Exam...

1 May 2012 · 3 min read
JavaData Engineering

Hadoop Development Environment in Eclipse

How to set up a Hadoop development environment in Eclipse with the WordCount MapReduce example.

24 Nov 2011 · 2 min read
Data EngineeringBig Data

Hadoop Windows Cygwin error tmp folder file permissions

Error java.io.IOException: Failed to set permissions of path: file:/tmp/hadoopjj/mapred/staging/jj1931875024/.staging to 0700 The details for my Hadoop we...

26 Oct 2011 · 1 min read
Data EngineeringBig Data

Hadoop Setup in Windows

Quick guide to installing Hadoop on Windows with Cygwin, including Java configuration and verification steps.

5 Oct 2011 · 1 min read