$ bin/hadoop jar hadoop-0.20.1-index.jar -inputPaths input/input -outputPat
ndex_msg_out_020 -indexPath index_040 -numShards 3 -numMapTasks 3 -conf conf
dex-config.xml
11/12/29 11:39:19 INFO main.UpdateIndex: inputPaths = input/input
11/12/29 11:39:19 INFO main.UpdateIndex: outputPath = index_msg_out_020
11/12/29 11:39:19 INFO main.UpdateIndex: shards = null
11/12/29 11:39:19 INFO main.UpdateIndex: indexPath = index_040
11/12/29 11:39:19 INFO main.UpdateIndex: numShards = 3
11/12/29 11:39:19 INFO main.UpdateIndex: numMapTasks= 3
11/12/29 11:39:19 INFO main.UpdateIndex: confPath = conf/index-config.xml
11/12/29 11:39:20 INFO main.UpdateIndex: sea.index.updater = org.apache.hado
ontrib.index.mapred.IndexUpdater
11/12/29 11:39:20 INFO mapred.IndexUpdater: mapred.input.dir = hdfs://localh
18888/user/kelo-dichan/administrator/input/input
11/12/29 11:39:20 INFO mapred.IndexUpdater: mapred.output.dir = hdfs://local
:18888/user/kelo-dichan/administrator/index_msg_out_020
11/12/29 11:39:20 INFO mapred.IndexUpdater: mapred.map.tasks = 3
11/12/29 11:39:20 INFO mapred.IndexUpdater: mapred.reduce.tasks = 3
11/12/29 11:39:20 INFO mapred.IndexUpdater: 3 shards = -1@index_040/00000@-1
index_040/00001@-1,-1@index_040/00002@-1
11/12/29 11:39:20 INFO mapred.IndexUpdater: mapred.input.format.class = org.
he.hadoop.contrib.index.example.LineDocInputFormat
11/12/29 11:39:20 WARN mapred.JobClient: Use GenericOptionsParser for parsin
e arguments. Applications should implement Tool for the same.
11/12/29 11:39:20 INFO mapred.FileInputFormat: Total input paths to process
11/12/29 11:39:20 INFO mapred.JobClient: Running job: job_201112291106_0009
11/12/29 11:39:21 INFO mapred.JobClient: map 0% reduce 0%
11/12/29 11:39:30 INFO mapred.JobClient: map 33% reduce 0%
11/12/29 11:39:34 INFO mapred.JobClient: map 100% reduce 0%
11/12/29 11:39:40 INFO mapred.JobClient: map 100% reduce 7%
11/12/29 11:39:43 INFO mapred.JobClient: map 100% reduce 14%
11/12/29 11:39:46 INFO mapred.JobClient: map 100% reduce 40%
11/12/29 11:39:49 INFO mapred.JobClient: map 100% reduce 66%
11/12/29 11:39:52 INFO mapred.JobClient: map 100% reduce 100%
11/12/29 11:39:54 INFO mapred.JobClient: Job complete: job_201112291106_0009
11/12/29 11:39:54 INFO mapred.JobClient: Counters: 18
11/12/29 11:39:54 INFO mapred.JobClient: Job Counters
11/12/29 11:39:54 INFO mapred.JobClient: Launched reduce tasks=3
11/12/29 11:39:54 INFO mapred.JobClient: Launched map tasks=3
11/12/29 11:39:54 INFO mapred.JobClient: Data-local map tasks=3
11/12/29 11:39:54 INFO mapred.JobClient: FileSystemCounters
11/12/29 11:39:54 INFO mapred.JobClient: FILE_BYTES_READ=2648
11/12/29 11:39:54 INFO mapred.JobClient: HDFS_BYTES_READ=161
11/12/29 11:39:54 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3279
11/12/29 11:39:54 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=1025
11/12/29 11:39:54 INFO mapred.JobClient: Map-Reduce Framework
11/12/29 11:39:54 INFO mapred.JobClient: Reduce input groups=2
11/12/29 11:39:54 INFO mapred.JobClient: Combine output records=3
11/12/29 11:39:54 INFO mapred.JobClient: Map input records=3
11/12/29 11:39:54 INFO mapred.JobClient: Reduce shuffle bytes=948
11/12/29 11:39:54 INFO mapred.JobClient: Reduce output records=2
11/12/29 11:39:54 INFO mapred.JobClient: Spilled Records=6
11/12/29 11:39:54 INFO mapred.JobClient: Map output bytes=1350
11/12/29 11:39:54 INFO mapred.JobClient: Map input bytes=161
11/12/29 11:39:54 INFO mapred.JobClient: Combine input records=3
11/12/29 11:39:54 INFO mapred.JobClient: Map output records=3
11/12/29 11:39:54 INFO mapred.JobClient: Reduce input records=3
11/12/29 11:39:54 INFO main.UpdateIndex: Index update job is done
11/12/29 11:39:54 INFO main.UpdateIndex: Elapsed time is 33s
Elapsed time is 33s
文件系统是hadoop 分布式文件系统
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:8888</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem. file:/// hdfs://localhost:8888</description>
</property>
文件系统是本地,这种方式也很好 core_site.xml
<property>
<name>fs.default.name</name>
<value>file:///</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem. file:/// hdfs://localhost:8888</description>
</property>
以下是本地文件系统使用例字 testdata/input 本地的相对目录(全路径是D:/hadoop/run/testdata/input D:/hadoop/run是安装路径)
jar hadoop-0.20.1-examples.jar wordcount testdata/input output-dir1
调式脚本启动(运行如下脚本后,在eclipse调试同一个程序,并使用远程调试方式(可配置))
./bin/hddebug jar hadoop-0.20.2-examples.jar wordcount input/input output-di
Listening for transport dt_socket at address: 28888
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not
exist: hdfs://127.0.0.1:8888/user/kelo-dichan/administrator/input/input
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(File
InputFormat.java:224)
at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileI
nputFormat.java:241)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
79)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(Progra
mDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
详细步
1、参考
1、先在win7下配置好hadoop一般可使用
2、然后把bin/hadoop 脚本copy一份,重新命名,叫hddebug
3、并在hddebug中增加一行
如下 即在 if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then增加
HADOOP_OPTS="$HADOOP_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,address=28888,server=y,suspend=y"
4、运行
./bin/hddebug jar hadoop-0.20.2-examples.jar wordcount input/input output-di
可看到 Listening for transport dt_socket at address: 28888
5、启动eclipse 调试wordcount这个代码
菜单,调试-设置成远程调试即可进行调试了