Now we will plugin hadoop on eclipse. (hadoop ver 2.6.0 | OS:CentOS7)
It is time to plugin hadoop on eclipse. I mentioned how to install eclipse on your OS last blog page.
All the path are based on what I did before, so if you having problem with any path configurations, please see the previous blog pages.
NOW START!
Step 1. Download Plugin files. Visit the below site and download files.
https://github.com/winghc/hadoop2x-eclipse-plugin
After download as a zip file, unzip it and open the terminal. This plugin works for hadoop ver 2.4.1 and ver 2.6.0.
Step 2. How to build JAR for plugin.
[hdpusr@demo hadoop2x-eclipse-plugin]$ cd src/contrib/eclipse-plugin
# Assume hadoop installation directory is /usr/share/hadoop // in our case, it should be /home/hadoop/hadoop
[hdpusr@apclt eclipse-plugin]$ ant jar -Dversion=2.6.0 -Dhadoop.version=2.6.0 -Declipse.home=/opt/eclipse -Dhadoop.home=/home/hadoop/hadoop // if your hadoop ver. is 2.4.1 change the version numbers.
final jar will be generated at directory
${hadoop2x-eclipse-plugin}/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.6.0.jar
$cp {hadoop2x-eclipse-plugin}/build/contrib/eclipse-plugin/hadoop-eclipse-plugin-2.6.0.jar /home/hadoop/Desktop/
copy hadoop-eclipse-plugin-2.6.0.jar (built one) to Desktop.
Step 3. Start your Single node hadoop cluster & Copy “Hadoop Eclipse Plugin” into Plugins directory of eclipse
$ start-all.sh
$ su
# cp /home/hadoop/Desktop/hadoop-eclipse-plugin-2.6.0.jar /opt/eclipse/plugins
Step 4. Run eclipse. After this point images may help to understand.
click, poen perspective icon top-right corner. and choose map/reduce.
Now we need to make connection on our hdfs.
Now DFS Location indicates the hfds directories. All set-up is finished. Run some example on it.
import java.io.IOException;
import java.util.StringTokenizer;
import java.util.ArrayList;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class HW1 {
public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString().toLowerCase().replaceAll("[^a-z ]", "");
StringTokenizer itr = new StringTokenizer(line);
ArrayList<String> pair = new ArrayList<String>();
int length=0;
String str;
while (itr.hasMoreTokens()) {
str = itr.nextToken().toString();
if(length == 0)
pair.add(str);
else{
for(int i=0;i<length;i++){
word.set("<"+str+","+pair.get(i).toString()+">");
context.write(word, one);
word.set("<"+pair.get(i).toString()+","+str+">");
context.write(word, one);
} //end for
pair.add(str);
} //end else
length++;
} //end while
} //end map
} //end TokenizerMapper
public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(HW1.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, new Path("hdfs://localhost:9000/input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/out"));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
You need to set input and output path on your code. Now you can check the result easily as follows:
Today, I studied how to set up hadoop plugin on eclipse and run the test code on it.