Hadoop读取本地文件运算写再写入本地
前几天给大家写了个hadoop文件系统的操作类,今天来实际应用一下:从本地文件系统读入一个文件,运算后将结果再写回本地。闲话少说,直接上代码:
public class mywordcount {public staticclass wordcountMapper extendsMapper<LongWritable, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException{String line = value.toString();StringTokenizer itr = new StringTokenizer(line);while(itr.hasMoreElements()){word.set(itr.nextToken());context.write(word, one);}}}public staticclass wordcountReduce extendsReducer<Text, IntWritable, Text, IntWritable>{public void reduce(Text key, Iterable<IntWritable>values, Context context)throws IOException, InterruptedException{int sum = 0;for (IntWritable str : values){sum += str.get();}context.write(key, new IntWritable(sum));}}public staticvoid main(String args[])throws Exception{//首先定义两个临时文件夹,这里可以使用随机函数+文件名,这样重名的几率就很小。String dstFile = "temp_src";String srcFile = "temp_dst";//这里生成文件操作对象。HDFS_File file = new HDFS_File();Configuration conf = new Configuration();//从本地上传文件到HDFS,可以是文件也可以是目录file.PutFile(conf, args, dstFile);System.out.println("up ok");Job job = new Job(conf, "mywordcount");job.setJarByClass(mywordcount.class);job.setInputFormatClass(TextInputFormat.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);job.setMapperClass(wordcountMapper.class);job.setReducerClass(wordcountReduce.class);job.setCombinerClass(wordcountReduce.class);//注意这里的输入输出都应该是在HDFS下的文件或目录FileInputFormat.setInputPaths(job, new Path(dstFile));FileOutputFormat.setOutputPath(job, new Path(srcFile));//开始运行job.waitForCompletion(true);//从HDFS取回文件保存至本地file.GetFile(conf, srcFile, args);System.out.println("down ok");//删除临时文件或目录file.DelFile(conf, dstFile, true);file.DelFile(conf, srcFile, true);System.out.println("del ok");}}
最后需要注意的是,在使用命令时文件或目录路径要使用绝对路径,防止出错。
页:
[1]