hege 发表于 2016-12-7 07:36:07

终于成功运行第一个Hadoop程序

  之前一直只在Windows上玩 为了Hadoop 只好迁移到Ubuntu上~
  


  记录一下自己的历程:
  1. 在Windows上安装Ubuntu
方法就是通过wubi安装,全自动化,非常傻瓜,太适合我这种linux小白了!
wubi可以直接在官网下载,当时我不知道,使用的还是从iso之中解压出来的程序 哈哈
下载地址:http://www.ubuntu.com/download/desktop/windows-installer
自己选择合适的来用啊
 
 
2. 进行各种配置,包括: JDK Hadoop, 当然也包括安装一个拼音输入法。。。。
JDK以及配置Hadoop的方法,完全参考的:http://os.iyunv.com/art/201211/364167.htm
  Note: 跟着上面的步骤走下来我才知道,这个实际上是伪分布式~
 
3. 开始跑一个例子程序吧~
hadoop自带的例子程序 就在解压出来的hadoop文件夹之中,但是一开始一直遇到一个很奇怪的问题:报错如下:
  13/07/06 20:10:59 ERROR security.UserGroupInformation: PriviledgedActionException as:hadoop cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/usr/local/hadoop/input
  org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/usr/local/hadoop/input
  at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:235)
  at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:252)
  at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
  at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1071)
  at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:983)
  at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
  at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
  at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
  at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
  at org.apache.hadoop.examples.WordCount.main(WordCount.java:82)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
  at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:606)
  at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
在各种google之后才知道,hdfs里面还没有把input这个文件夹加上去
敲入命令:
  hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -put input input
这样之后,通过dfs的-ls命令终于能找到input文件夹:
  hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls
  Found 1 items
  drwxr-xr-x - hadoop supergroup 0 2013-07-06 20:12 /user/hadoop/input
 
这个时候再次执行运行例子程序的命令,就没有之前遇到的异常:
  Input path does not exist: hdfs://localhost:9000/usr/local/hadoop/input
最后正确执行的log如下:
  hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.2.0.jar wordcount input wc_output2
  13/07/06 20:12:39 INFO input.FileInputFormat: Total input paths to process : 1
  13/07/06 20:12:39 INFO util.NativeCodeLoader: Loaded the native-hadoop library
  13/07/06 20:12:39 WARN snappy.LoadSnappy: Snappy native library not loaded
  13/07/06 20:12:40 INFO mapred.JobClient: Running job: job_201307061937_0006
  13/07/06 20:12:41 INFO mapred.JobClient: map 0% reduce 0%
  13/07/06 20:12:46 INFO mapred.JobClient: map 100% reduce 0%
  13/07/06 20:12:53 INFO mapred.JobClient: map 100% reduce 33%
  13/07/06 20:12:55 INFO mapred.JobClient: map 100% reduce 100%
  13/07/06 20:12:55 INFO mapred.JobClient: Job complete: job_201307061937_0006
  13/07/06 20:12:56 INFO mapred.JobClient: Counters: 29
  13/07/06 20:12:56 INFO mapred.JobClient: Job Counters
  13/07/06 20:12:56 INFO mapred.JobClient: Launched reduce tasks=1
  13/07/06 20:12:56 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4785
  13/07/06 20:12:56 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
  13/07/06 20:12:56 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
  13/07/06 20:12:56 INFO mapred.JobClient: Launched map tasks=1
  13/07/06 20:12:56 INFO mapred.JobClient: Data-local map tasks=1
  13/07/06 20:12:56 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8917
  13/07/06 20:12:56 INFO mapred.JobClient: File Output Format Counters
  13/07/06 20:12:56 INFO mapred.JobClient: Bytes Written=24591
  13/07/06 20:12:56 INFO mapred.JobClient: FileSystemCounters
  13/07/06 20:12:56 INFO mapred.JobClient: FILE_BYTES_READ=34471
  13/07/06 20:12:56 INFO mapred.JobClient: HDFS_BYTES_READ=46619
  13/07/06 20:12:56 INFO mapred.JobClient: FILE_BYTES_WRITTEN=180119
  13/07/06 20:12:56 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=24591
  13/07/06 20:12:56 INFO mapred.JobClient: File Input Format Counters
  13/07/06 20:12:56 INFO mapred.JobClient: Bytes Read=46510
  13/07/06 20:12:56 INFO mapred.JobClient: Map-Reduce Framework
  13/07/06 20:12:56 INFO mapred.JobClient: Map output materialized bytes=34471
  13/07/06 20:12:56 INFO mapred.JobClient: Map input records=561
  13/07/06 20:12:56 INFO mapred.JobClient: Reduce shuffle bytes=34471
  13/07/06 20:12:56 INFO mapred.JobClient: Spilled Records=4992
  13/07/06 20:12:56 INFO mapred.JobClient: Map output bytes=77170
  13/07/06 20:12:56 INFO mapred.JobClient: Total committed heap usage (bytes)=219152384
  13/07/06 20:12:56 INFO mapred.JobClient: CPU time spent (ms)=2450
  13/07/06 20:12:56 INFO mapred.JobClient: Combine input records=7804
  13/07/06 20:12:56 INFO mapred.JobClient: SPLIT_RAW_BYTES=109
  13/07/06 20:12:56 INFO mapred.JobClient: Reduce input records=2496
  13/07/06 20:12:56 INFO mapred.JobClient: Reduce input groups=2496
  13/07/06 20:12:56 INFO mapred.JobClient: Combine output records=2496
  13/07/06 20:12:56 INFO mapred.JobClient: Physical memory (bytes) snapshot=280190976
  13/07/06 20:12:56 INFO mapred.JobClient: Reduce output records=2496
  13/07/06 20:12:56 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2009939968
  13/07/06 20:12:56 INFO mapred.JobClient: Map output records=7804
 
 
最后,查看运行完成之后的结果:
  bin/hadoop dfs -cat wc_output2/* | more
 
备注:
几个GUI管理网址:[目前我知道的]
http://localhost:50070/dfshealth.jsp 
http://localhost:50030/jobtracker.jsp
页: [1]
查看完整版本: 终于成功运行第一个Hadoop程序