Hadoop之MapReduce命令

huijial 发表于 2016-12-6 08:18:59

概述

　　所有的Hadoop命令都通过bin/mapred脚本调用。在没有任何参数的情况下，运行mapred脚本将打印该命令描述。
　　使用：mapred [--config confdir] COMMAND

$ mapred
Usage: mapred [--config confdir] COMMAND
where COMMAND is one of:
pipes             run a Pipes job
job                manipulate MapReduce jobs
queue             get information regarding JobQueues
classpath          prints the class path needed for running
mapreduce subcommands
historyserver    run job history servers as a standalone daemon
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
hsadmin          job history server admin interface
Most commands print help when invoked w/o parameters.
用户命令
　　对于Hadoop集群用户很有用的命令：

archive
　　查看：Hadoop之命令指南
　　

classpath
　　打印需要得到Hadoop的jar和所需要的lib包路径，hdfs，yarn脚本都有这个命令。
　　使用: mapred classpath
　　

distcp
　　递归的拷贝文件或者目录，查看该篇中的示例：Hadoop之命令指南。
　　

job
　　通过job命令和MapReduce任务交互。
　　使用：mapred job | | [-submit <job-file>] | [-status <job-id>] | [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] | [-events <job-id> <from-event-#> <#-of-events>] | [-history <jobOutputDir>] | [-list ]
| [-kill-task <task-id>] | [-fail-task <task-id>] | [-set-priority <job-id> <priority>]

　　

参数选项
描述

-submitjob-file

提交一个job.

-statusjob-id

打印map任务和reduce任务完成百分比和所有JOB的计数器。

-counterjob-id group-name counter-name

打印计数器的值。

-killjob-id

根据job-id杀掉指定job.

-eventsjob-id from-event-# #-of-events

打印给力访问内jobtracker接受到的事件细节。（使用方法见示例）

-history jobOutputDir

打印JOB的细节，失败和杀掉原因的细节。更多的关于一个作业的细节比如:成功的任务和每个任务尝试等信息可以通过指定选项查看。

-list

打印当前正在运行的JOB，如果加了all，则打印所有的JOB。

-kill-tasktask-id

Kill任务，杀掉的任务不记录失败重试的数量。

-fail-tasktask-id

Fail任务，杀掉的任务不记录失败重试的数量。

默认任务的尝试次数是4次超过四次则不尝试。那么如果使用fail-task命令fail同一个任务四次，这个任务将不会继续尝试，而且会导致整个JOB失败。

-set-priorityjob-id priority

改变JOB的优先级。允许的优先级有：VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW

　　示例：

$ mapred job -events job_1437364567082_0109 0 100
15/08/13 15:10:53 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
Task completion events for job_1437364567082_0109
Number of events (from 0) are: 1
SUCCEEDED attempt_1437364567082_0109_m_000016_0 http://hadoopcluster83:13562/tasklog?plaintext=true&attemptid=attempt_1437364567082_0109_m_000016_0
$ mapred job -kill-task attempt_1437364567082_0111_m_000000_4
15/08/13 15:51:25 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
Killed task attempt_1437364567082_0111_m_000000_4

pipes
　　运行pipes JOB。关于pipe，查看：Hadoop pipes编程

　　Hadoop pipes允许C++程序员编写mapreduce程序。它允许用户混用C++和Java的RecordReader， Mapper， Partitioner，Rducer和RecordWriter等五个组件。

　　Usage: mapred pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>] [-program <executable>] [-reduces
<num>]
　　

参数选项
描述

-confpath

Job的配置文件路径。

-jobconfkey=value,key=value,
…
增加/重载 JOB的配置。

-inputpath

输入路径

-outputpath

输出路径

-jarjar
file

JAR文件名

-inputformatclass

InputFormat类

-mapclass

Java
Map 类

-partitionerclass

Java
Partitioner

-reduceclass

Java
Reduce 类

-writerclass

Java
RecordWriter

-programexecutable

可执行的URI

-reducesnum

reduce的数量

　　

　　

queue
　　该命令用于交互和查看Job Queue信息。
　　使用: mapred queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]

　　

参数选项
描述

-list

获取在系统配置的Job Queues列表。已经Job Queues的调度信息。

-infojob-queue-name[-showJobs]

显示一个指定Job Queue的信息和它的调度信息。如果使用-showJobs选项，则显示当前正在运行的JOB列表。

-showacls

显示队列名和允许当前用户对队列的相关操作。这个命令打印的命令是当前用户可以访问的。

示例：

$ mapred queue -list
15/08/13 14:25:30 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
======================
Queue Name : default
Queue State : running
Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 47.5
$ mapred queue -info default
15/08/13 14:28:45 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
======================
Queue Name : default
Queue State : running
Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5
$ mapred queue -info default -showJobs
15/08/13 14:29:08 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
======================
Queue Name : default
Queue State : running
Scheduling Info : Capacity: 100.0, MaximumCapacity: 100.0, CurrentCapacity: 72.5
Total jobs:1
JobIdStateStartTimeUserNameQueuePriorityUsedContainersRsvdContainersUsedMemRsvdMemNeededMemAM info
job_1437364567082_0107RUNNING1439447102615rootdefaultNORMAL28029696M0M29696M http://hadoopcluster79:8088/proxy/application_1437364567082_0107/
$ mapred queue -showacls
15/08/13 14:31:44 INFO client.RMProxy: Connecting to ResourceManager at hadoopcluster79/10.0.1.79:8032
Queue acls for user : hadoop
Queue Operations
=====================
root ADMINISTER_QUEUE,SUBMIT_APPLICATIONS
default ADMINISTER_QUEUE,SUBMIT_APPLICATIONS

管理员命令
　　以下是对hadoop集群超级管理员很有用的命令。

historyserver
　　启动JobHistoryServer服务。
　　使用: mapred historyserver
　　也可以使用sbin/mr-jobhistory-daemon.sh start|stop historyserver来启动/停止JobHistoryServer。

hsadmin
　　运行hsadmin去执行JobHistoryServer管理命令。
　　Usage: mapred hsadmin [-refreshUserToGroupsMappings] | [-refreshSuperUserGroupsConfiguration] | [-refreshAdminAcls] | [-refreshLoadedJobCache] | [-refreshLogRetentionSettings] | [-refreshJobRetentionSettings] | [-getGroups ] | [-help ]

参数配置
描述

-refreshUserToGroupsMappings

刷新用户-组的对应关系。

-refreshSuperUserGroupsConfiguration

刷新超级用户代理组映射

-refreshAdminAcls

刷新JobHistoryServer管理的ACL

-refreshLoadedJobCache

刷新JobHistoryServer加载JOB的缓存

-refreshJobRetentionSettings

刷新Job histroy旗舰，job cleaner被设置。

-refreshLogRetentionSettings

刷新日志保留周期和日志保留的检查间隔

-getGroups

获取这个用户名属于哪个组

-help

帮助

示例：
$ mapred hsadmin -getGroups hadoop
hadoop : clustergroup

页: [1]

运维网's Archiver

Hadoop之MapReduce命令