JavaData EngineeringBig DataHadoop

Hadoop Development Environment in Eclipse

How to set up a Hadoop development environment in Eclipse with the WordCount MapReduce example.

24 November 2011 · 2 min read

The steps to set up a Hadoop development environment in Eclipse are:

Create a Java Project.
In the build path, add the Hadoop JAR files that come with the Hadoop binary download from the Apache website. It is also recommended that you link the Javadoc and source files for those JARs.
Create a sample program based on the famous WordCount example.

The example on the official documentation for version 0.20.203 is based on the old API. For the latest example, see this detailed guide which also shows the full Eclipse setup steps:

Hadoop Development Environment with Eclipse

Sample WordCount Code

Create an input folder within your project workspace folder, then run the program:

package com.hadoop;

import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount1 {

  public static class SimpleMapper extends Mapper {
    private Text word = new Text();
    private static final IntWritable one = new IntWritable(1);

    public void map(Object key, Text value, Context context)
        throws IOException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        try {
          context.write(word, one);
        } catch (InterruptedException e) {
          e.printStackTrace();
        }
      }
    }
  }

  public static class SampleReducer extends Reducer {
    private IntWritable result = new IntWritable();

    protected void reduce(Text key, Iterator values,
        Context context) throws IOException, InterruptedException {
      int sum = 0;
      while (values.hasNext()) {
        sum += values.next().get();
      }
      context.write(key, new IntWritable(sum));
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    GenericOptionsParser g = new GenericOptionsParser(conf, args);
    String[] otherArgs = g.getRemainingArgs();

    Job job = new Job(conf, "Example Hadoop 0.20.1 WordCount");
    job.setJarByClass(WordCount1.class);
    job.setMapperClass(SimpleMapper.class);
    job.setReducerClass(SampleReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path("input"));
    FileOutputFormat.setOutputPath(job, new Path("output"));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

Java Data Engineering Big Data Hadoop

Disclosure: Ideas and analysis are my own. AI assisted with drafting and editing.