hadoop yarn-distributedshell

zhu894532094 · 发表于 2016-12-4 10:07:12

　　this application is introduced to run shell command in distributed nodes(containers) as it named,so it's is ealy and let's to go ahead.
　　1.run 'ls' command in containers
　　2.which path does that command run on ?
　　3.how to run meaningful commands depend on nodes
　　1.run 'ls' command in containers
　　

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_command ls -num_containers 1 -container_memory 300 -master_memory 400
　　so the command 'ls' will run on any containers .and the result will like this:

more userlogs/application_1433385109839_0001/container_1433385109839_0001_01_000002/stdout
container_tokens
default_container_executor.sh
launch_container.sh
tmp

　　why this file contains these content?u can lookk into the <nodemanager.log>

2015-06-04 15:55:10,424 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/user/userxx/DistributedShell/application_1433403689317_0001/AppMaster.jar(->/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/filecache/10/AppMaster.jar) transitioned from DOWNLOADING to LOCALIZED
2015-06-04 15:55:10,502 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource: Resource hdfs://localhost:9000/user/userxx/DistributedShell/application_1433403689317_0001/shellCommands(->/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/filecache/11/shellCommands) transitioned from DOWNLOADING to LOCALIZED
2015-06-04 15:55:10,644 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: launchContainer: [nice, -n, 0, bash, /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001/container_1433403689317_0001_01_000001/default_container_executor.sh]
　　u will see there is a file named 'defaultc_container_executor.sh' placed in the working dir(current container name).so the result from this command is correct.
　　2.which path does that command run on ?
　　yes,the result is absoulte right,but how to verify to current working dir is lied in 'container_1433385109839_0001_01_000001'?
　　of course,it 's simple too,u can use 'pwd' instead of 'ls' for the shell_command param.

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_command pwd -num_containers 1 -container_memory 300 -master_memory 400
　　now ,check out the stdout file,the result will like this:

/usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0002/container_1433403689317_0002_01_000002

　　but this time,the dir is bit differences from point 1,as this is the second app;)
　　3.how to run meaningful commands depend on nodes
　　but u if want to use a *custom script*(use some params in command params) to run on *node-specified*(ie different result for different nodes),u can use a script file to achieve this:

hadoop jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar org.apache.hadoop.yarn.applications.distributedshell.Client -jar share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.5.1.jar -shell_script ls-command.sh -num_containers 1 -container_memory 300 -master_memory 400
　　and the file 'ls-command.sh' is simple:

ls -al /tmp/
　　yep,this file must be alllowed to be executable,so do it prior to run this command:

chmod +x ls-command.sh
　　
　　appendix:
　　A. from the <nodemanager.log>,we found this info:

2015-06-04 15:55:17,223 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Application application_1433403689317_0001 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2015-06-04 15:55:17,223 INFO org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Deleting absolute path : /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/application_1433403689317_0001

　　so if u check out the final dir appache,nothing will be there:

ll /usr/local/hadoop/data-2.5.1/tmp/nm-local-dir/usercache/userxx/appcache/
total 0

　　B.the AM is responsible for setupping the containers.yeah,finally the NM will startup the containers

more userlogs/application_1433385109839_0001/container_1433385109839_0001_01_000001/AppMaster.stderr
15/06/04 12:26:09 INFO distributedshell.ApplicationMaster: Initializing ApplicationMaster
15/06/04 12:26:09 INFO distributedshell.ApplicationMaster: Application master for app, appId=1, clustertimestamp=1433385109839, attemptId=1
2015-06-04 12:26:09.755 java[1261:1903] Unable to load realm info from SCDynamicStore
15/06/04 12:26:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/06/04 12:26:10 INFO impl.TimelineClientImpl: Timeline service is not enabled
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Starting ApplicationMaster
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Executing with tokens:
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (org.apache.hadoop.yarn.security.AMRMTokenIdentifier@7950d786)
15/06/04 12:26:10 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8030
15/06/04 12:26:10 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
15/06/04 12:26:10 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-nodemanagers-proxies : 500
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max mem capabililty of resources in this cluster 8192
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max vcores capabililty of resources in this cluster 32
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Received 0 previous AM's running containers on AM registration.
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[<memory:300, vCores:1>]Priority[0]
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Requested container ask: Capability[<memory:300, vCores:1>]Priority[0]
15/06/04 12:26:12 INFO impl.AMRMClientImpl: Received new token for : localhost:52226
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1433385109839_0001_01_000002, containerNode=localhost:52226, containerNodeURI=localhost:8042, containerResourceMemory1024, containerResourceVirtualCores1
15/06/04 12:26:12 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1433385109839_0001_01_000002
15/06/04 12:26:12 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1433385109839_0001_01_000002
15/06/04 12:26:12 INFO impl.ContainerManagementProtocolProxy: Opening proxy : localhost:52226
15/06/04 12:26:12 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1433385109839_0001_01_000002
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1433385109839_0001_01_000002, state=COMPLETE, exitStatus=0, diagnostics=
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1433385109839_0001_01_000002
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, allocatedCnt=1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Launching shell command on a new container., containerId=container_1433385109839_0001_01_000003, containerNode=localhost:52226, containerNodeURI=localhost:8042, containerResourceMemory1024, containerResourceVirtualCores1
15/06/04 12:26:13 INFO distributedshell.ApplicationMaster: Setting up container launch container for containerid=container_1433385109839_0001_01_000003
15/06/04 12:26:13 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_1433385109839_0001_01_000003
15/06/04 12:26:13 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_1433385109839_0001_01_000003
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=1
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1433385109839_0001_01_000003, state=COMPLETE, exitStatus=0, diagnostics=
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Container completed successfully., containerId=container_1433385109839_0001_01_000003
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Application completed. Stopping running containers
15/06/04 12:26:14 INFO impl.ContainerManagementProtocolProxy: Closing proxy : localhost:52226
15/06/04 12:26:14 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM
15/06/04 12:26:14 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
15/06/04 12:26:15 INFO distributedshell.ApplicationMaster: Application Master completed successfully. exiting
　　 and always the AM will start previously at first container then others.
　　C.questions:my macbook pro is configured by 8g ram and i5(2.4g) two cores cpu,but i found i got a 32 vcores from above:

15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max mem capabililty of resources in this cluster 8192
15/06/04 12:26:10 INFO distributedshell.ApplicationMaster: Max vcores capabililty of resources in this cluster 32

　　anyone knows that?so i will dig into it tomorrow..
　　 after i recreated a new job on a big cluster(32g mem,8 cpus),these info were kept the same,so i thought these are the config values set in code or xml.
　　 today,i dig into 'CapacityScheduler#getMaximumAllocation()'

  public Resource getMaximumAllocation() {
int maximumMemory = getInt(
YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_MB,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_MB);
int maximumCores = getInt(
YarnConfiguration.RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MAXIMUM_ALLOCATION_VCORES);
return Resources.createResource(maximumMemory, maximumCores);
}
  public Resource getMinimumAllocation() {
int minimumMemory = getInt(
YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_MB,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_MB);
int minimumCores = getInt(
YarnConfiguration.RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES,
YarnConfiguration.DEFAULT_RM_SCHEDULER_MINIMUM_ALLOCATION_VCORES);
return Resources.createResource(minimumMemory, minimumCores);
}

case	property	default in code	default in xml	description
max	xx.scheduler.maximum-allocation-mb	8g	8g	max ram per container. The maximum allocation for every container request at the RM, in MBs. Memory requests higher than this won't take effect, and will get capped to this value
	xx.scheduler.maximum-allocation-vcores	4cores	32 cores	max vcores per container. The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value
min	xx.scheduler.minimum-allocation-mb	1g	1g
	xx.scheduler.minimum-allocation-vcore	1core	1core

　　 of course ,there are some questions lied there:
　　1.if a node configed 4g,and sure a max-allocation-mb should be less or equals than 4g,but now,my task need 5g to run on it,how about it?i think this node will never run any tasks.so a fix resolution is necessary,e.g:

// A resource ask cannot exceed the max.
if (amMemory > maxMem) {
LOG.info("AM memory specified above max threshold of cluster. Using max value."
+ ", specified=" + amMemory
+ ", max=" + maxMem);
amMemory = maxMem;
}
　　D.container id does not restrictly follow the app attempt id But app id
　　container id

container_1433385109839_0001_01_000003
　　app attempt id

application_1433385109839_0001_00001
　　app id

application_1433385109839_0001
　　since one app maybe contain multi attempts,so the container must bind to app id instead of attempt id for umbilical relationship.
　　ref:
　　http://dongxicheng.org/mapreduce-nextgen/how-to-run-distributedshell/

账号		自动登录	找回密码
密码			立即注册

Centos6.5×64安装配置openmeetings3.0.3详

大疆运维招人啦，

C++ :try 语句块和异常处理

C++的多态

Red Hat RHCE 8 (EX294) Cert Guide

Java/C++ 区别：看完这一篇，就够用！

别再用过时库了！这 13 个顶级 C++ 库才是

[经验分享] hadoop yarn-distributedshell

浏览过的版块

扫码加入运维网微信交流群