Big DataYARNData EngineeringHadoop

Why my Hive Sqoop job is failing

A practical troubleshooting guide for common Hive and Sqoop job failures on YARN, covering memory tuning, CPU allocation, and where to find the right logs.

24 April 2015 · 3 min read

Start with the Basics

Before troubleshooting, learn a few fundamentals about your cluster configuration from your administrator.

Sample conversation:

Q: How many nodes does the cluster have, and what is the configuration?

A: Each node has 120 GB RAM. Of that, about 80 GB is available for our jobs. Each DataNode has 14 CPU cores, of which a maximum of 8 are available for processing — the rest are reserved for the OS, Hadoop daemons, and monitoring services.

When you run any MapReduce job, the minimum RAM allocation is 2 GB and the maximum any single task can request is 80 GB (the per-node capacity).

If you are running large one-off loads, tell the system to give your job more RAM (in increments of 1024 MB). You can also request additional CPU cores, up to the 8 available per node.

Hive Jobs

The relevant tuning parameters are:

  • mapreduce.map.memory.mb
  • mapreduce.reduce.memory.mb
  • mapreduce.map.java.opts
  • mapreduce.reduce.java.opts
  • mapreduce.map.cpu.vcores
  • mapreduce.reduce.cpu.vcores

If your job fails while running a Hive INSERT query, check whether you need to tune the memory parameters. Hive INSERT jobs are reduce-heavy, and inserting large amounts of data in one go often leads to memory overruns.

Always check the logs — the reason for the failure is almost always there.

Sqoop Jobs

Sqoop jobs spawn only map tasks. If a Sqoop job is not making progress, the most likely indicator is a memory issue.

Add the following parameters to your Sqoop command:

-Dmapreduce.map.memory.mb=5120 -Dmapreduce.map.speculative=false

Tune the 5120 value based on your needs.

Where to See Logs and Job Status

  • Resource Manager: http://resource:8088/cluster
  • Ambari: http://ambari:8080 (ask your administrator for read-only credentials)

Check the current default values in Ambari:

mapreduce.map.java.opts=-Xmx5012m
mapreduce.reduce.java.opts=-Xmx6144m
mapreduce.map.memory.mb=4096
mapreduce.reduce.memory.mb=8192

What If My Job Is Not Even Accepted by the Cluster?

You are requesting resources that exceed what the cluster can provide. Check what your job is actually asking for and compare it against the cluster’s capacity limits.

Why Is My Job Being Killed?

If your job exceeds the resource limit it originally requested from the ResourceManager, YARN will kill the container. You will see something like this in the logs:

Killing container....

Remember: Google Is Your Friend

When you see a job failure, grab the error from the logs and search for it. Look for which parameters others have suggested changing for similar issues.

Resources