public static void main(String[] args) throws IOException,
ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage: wordcount <in> <out>");
System.exit(2);
}
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setCombinerClass(WordCountReducer.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Once we create the input/wordcount directory and output directory, we can directly run that on eclipse.
Just create a run configuration for Java Application, add Arguments
input/wordcount output/wordcount
Add Environment
HADOOP_HOME=/opt/hadoop
or
hadoop.home.dir=/opt/hadoop
If I want to run it from the multiple machine cluster. I need to create the jar based on maven
>mvn clean install
First put the jar under this directory /opt/hadoop/share/custom, Here is how it runs on local machine
>hadoop jar /opt/hadoop/share/custom/easyhadoop-1.0.jar wordcount input output
On the ubuntu-master, place the jar under the /opt/hadoop/share/custom directory.
Start all the servers.
>sbin/start-dfs.sh
>sbin/start-yarn.sh
>sbin/mr-jobhistory-daemon.sh start historyserver
Since I already put my files in the hdfs.
>hadoop fs -mkdir -p /data/worldcount
>hadoop fs -put /opt/hadoop/etc/hadoop/*.xml /data/worldcount/
I can directly run my jar
>hadoop jar /opt/hadoop/share/custom/easyhadoop-1.0.jar wordcount /data/worldcount /output/worldcount2
And this will show me the result
>hadoop fs -cat /output/worldcount2/*
Actually, I just want to know about hadoop and map reduce framework, finally, I thought I will use Hbase, Spark. So I did not try to mapping and reducing based on database.
mapper from and reducer to DB http://archanaschangale.wordpress.com/2013/09/26/database-access-with-apache-hadoop/ http://shazsterblog.blogspot.com/2012/11/storing-hadoop-wordcount-example-with.html