1397535668 发表于 2018-10-28 12:23:34

2018-07-07期 Hadoop本地运行模式配置

  详细来源:05-Hadoop本地运行模式配置
  在Windows开发环境中实现Hadoop的本地运行模式,详细步骤如下:
  1、在本地安装好jdk、hadoop2.4.1,并配置好环境变量:JAVA_HOME、HADOOP_HOME、Path路径(配置好环境变量后最好重启电脑)
  2、用hadoop-common-2.2.0-bin-master的bin目录替换本地hadoop2.4.1的bin目录,因为hadoop2.0版本中没有hadoop.dll和winutils.exe这两个文件。
  如果缺少hadoop.dll和winutils.exe话,程序将会抛出下面异常:
  java.io.IOException: Could not locate executable D:\hadoop-2.4.1\bin\winutils.exe in the Hadoop binaries.
  java.lang.Exception: java.lang.NullPointerException
  所以用hadoop-common-2.2.0-bin-master的bin目录替换本地hadoop2.4.1的bin目录是必要的一个步骤。
  注意:如果只是将hadoop-common-2.2.0-bin-master的bin目录中的hadoop.dll和winutils.exe这两个文件添加到hadoop2.4.1的bin目录中,也是可行的,但最好用用hadoop-common-2.2.0-bin-master的bin目录替换本地hadoop2.4.1的bin目录。
  上面这两个步骤完成之后我们就可以跑程序了,从而实现Hadoop的本地运行模式:
  首先输入输出路径都选择windows的文件系统:
  代码如下:
  代码如下:
  package MapReduce;
  import java.io.IOException;
  import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.fs.FileSystem;
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.io.LongWritable;
  import org.apache.hadoop.io.Text;
  import org.apache.hadoop.mapreduce.Job;
  import org.apache.hadoop.mapreduce.Mapper;
  import org.apache.hadoop.mapreduce.Reducer;
  import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
  import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
  import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

  public>  {
  public static String path1 = "file:///C:\\word.txt";//读取本地windows文件系统中的数据
  public static String path2 = "file:///D:\\dir";
  public static void main(String[] args) throws Exception
  {
  Configuration conf = new Configuration();
  FileSystem fileSystem = FileSystem.get(conf);
  if(fileSystem.exists(new Path(path2)))
  {
  fileSystem.delete(new Path(path2), true);
  }
  Job job = Job.getInstance(conf);
  job.setJarByClass(WordCount.class);
  FileInputFormat.setInputPaths(job, new Path(path1));
  job.setInputFormatClass(TextInputFormat.class);
  job.setMapperClass(MyMapper.class);
  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(LongWritable.class);
  job.setNumReduceTasks(1);
  job.setPartitionerClass(HashPartitioner.class);
  job.setReducerClass(MyReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(LongWritable.class);
  job.setOutputFormatClass(TextOutputFormat.class);
  FileOutputFormat.setOutputPath(job, new Path(path2));
  job.waitForCompletion(true);
  }
  publicstaticclass MyMapper extends Mapper
  {
  protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException
  {
  String[] splited = v1.toString().split("\t");
  for (String string : splited)
  {
  context.write(new Text(string),new LongWritable(1L));
  }
  }
  }

  publicstatic>  {
  protected void reduce(Text k2, Iterable v2s,Context context)throws IOException, InterruptedException
  {
  long sum = 0L;
  for (LongWritable v2 : v2s)
  {
  sum += v2.get();
  }
  context.write(k2,new LongWritable(sum));
  }
  }
  }
  在dos下查看运行中的java进程:
  其中28568为windows中启动的eclipse进程。
  接下来我们查看运行结果:
http://blog.51cto.com/2951890/C:/Users/songjq/AppData/Local/YNote/data/qq46325E443ABD3D4E2C47840A7BE321F7/da9d72f6017d4feebd0a9e8628371b44/704160721515.png
  part-r-00000中的内容如下:
  hello   2me1you 1
  接下来输入路径选择windows本地,输出路径换成HDFS文件系统,代码如下:
  package MapReduce;
  import java.io.IOException;
  import org.apache.hadoop.conf.Configuration;
  import org.apache.hadoop.fs.FileSystem;
  import org.apache.hadoop.fs.Path;
  import org.apache.hadoop.io.LongWritable;
  import org.apache.hadoop.io.Text;
  import org.apache.hadoop.mapreduce.Job;
  import org.apache.hadoop.mapreduce.Mapper;
  import org.apache.hadoop.mapreduce.Reducer;
  import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
  import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
  import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

  public>  {
  public static String path1 = "file:///C:\\word.txt";//读取windows文件系统中的数据
  public static String path2 = "hdfs://hadoop20:9000/dir";//输出到hdfs中
  public static void main(String[] args) throws Exception
  {
  Configuration conf = new Configuration();
  FileSystem fileSystem = FileSystem.get(conf);
  if(fileSystem.exists(new Path(path2)))
  {
  fileSystem.delete(new Path(path2), true);
  }
  Job job = Job.getInstance(conf);
  job.setJarByClass(WordCount.class);
  FileInputFormat.setInputPaths(job, new Path(path1));
  job.setInputFormatClass(TextInputFormat.class);
  job.setMapperClass(MyMapper.class);
  job.setMapOutputKeyClass(Text.class);
  job.setMapOutputValueClass(LongWritable.class);
  job.setNumReduceTasks(1);
  job.setPartitionerClass(HashPartitioner.class);
  job.setReducerClass(MyReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(LongWritable.class);
  job.setOutputFormatClass(TextOutputFormat.class);
  FileOutputFormat.setOutputPath(job, new Path(path2));
  job.waitForCompletion(true);
  }
  publicstaticclass MyMapper extends Mapper
  {
  protected void map(LongWritable k1, Text v1,Context context)throws IOException, InterruptedException
  {
  String[] splited = v1.toString().split("\t");
  for (String string : splited)
  {
  context.write(new Text(string),new LongWritable(1L));
  }
  }
  }

  publicstatic>  {
  protected void reduce(Text k2, Iterable v2s,Context context)throws IOException, InterruptedException
  {
  long sum = 0L;
  for (LongWritable v2 : v2s)
  {
  sum += v2.get();
  }
  context.write(k2,new LongWritable(sum));
  }
  }
  }
  程序抛出异常:
http://blog.51cto.com/2951890/C:/Users/songjq/AppData/Local/YNote/data/qq46325E443ABD3D4E2C47840A7BE321F7/0ac45502c8084e37af9782e2ac1c3168/704161032192.png
  处理措施同上:
  Configuration conf = new Configuration();conf.set("fs.defaultFS", "hdfs://hadoop20:9000/");FileSystem fileSystem = FileSystem.get(conf);//获取HDFS中的FileSystem实例
  查看运行结果:
  # hadoop fs -cat /dir/part-r-00000hello   2me      1you   1
  好的,到这里hadoop的本地文件系统就讲述完了,注意一下几点:
  1、file:\\ 代表本地文件系统,hdfs:// 代表hdfs分布式文件系统
  2、linux下的hadoop本地运行模式很简单,但是windows下的hadoop本地运行模式需要配置相应文件。
  3、MapReduce所用的文件放在哪里是没有关系的(可以放在Windows本地文件系统、可以放在Linux本地文件系统、也可以放在HDFS分布式文件系统中),最后是通过FileSystem这个实例来获取文件的。
  如有问题,欢迎留言指正!
  注意:如果用户用的是Hadoop1.0版本,并且是Windows环境下实现本地运行模式,则只需设置HADOOP_HOME与PATH路径,其余不用任何设置!
  --Exception: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
  错误
  Windows的唯一方法用于检查当前进程的请求,在给定的路径的访问权限,所以我们先给以能进行访问,我们自己先修改源代码,return true 时允许访问。我们下载对应hadoop源代码,hadoop-2.7.3-src.tar.gz解压,hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio下NativeIO.java 复制到对应的Eclipse的project
  即:把红色源码进行修改
http://blog.51cto.com/2951890/C:/Users/songjq/AppData/Local/YNote/data/qq46325E443ABD3D4E2C47840A7BE321F7/a813b35bd0a7452b8a0c10a27c104950/216164924959.png
  修改为返回true
http://blog.51cto.com/2951890/C:/Users/songjq/AppData/Local/YNote/data/qq46325E443ABD3D4E2C47840A7BE321F7/8543af05c78648928784372f170df2a9/216165014277.png
  问题解决
  处理方式:
  第一步:下载hadoo2.7.3的hadoop.dll和winutils.exe.zip赋值覆盖hadoop本地bin下,同时拷贝到C:\Windows\System32下(覆盖)
  第二步:项目下新建包名org.apache.hadoop.io.nativeio新建类NativeIO,接下来再次在Windows下运行eclipse中的Hadoop程序,Ok

页: [1]
查看完整版本: 2018-07-07期 Hadoop本地运行模式配置