8.3 Application Master向Resource Manager资源申请时,还会携带内存数量信息,默认情况下,Map任务和Reduce任务都会分陪1G内存,这个值是可以通过参数mapreduce.map.memory.mb and mapreduce.reduce.memory.mb进行修改。
[Hadoop权威指南第六章]
The way memory is allocated is different from MapReduce 1, where tasktrackers have a fixed number of “slots,” set at cluster configuration time, and each task runs in a single slot. Slots have a maximum memory allowance, which again is fixed for a cluster, leading to both problems of underutilization when tasks use less memory (because other waiting tasks are not able to take advantage of the unused memory) and problems of job failure when a task can’t complete since it can’t get enough memory to run correctly and therefore can’t complete.
In YARN, resources are more fine-grained, so both of these problems can be avoided.In particular, applications may request a memory capability that is anywhere between the minimum allocation and a maximum allocation, and that must be a multiple of the minimum allocation. Default memory allocations are scheduler-specific, and for the capacity scheduler, the default minimum is 1024 MB (set by yarn.scheduler.capacity .minimum-allocation-mb) and the default maximum is 10240 MB (set by yarn.schedu ler.capacity.maximum-allocation-mb). Thus, tasks can request any memory allocation between 1 and 10 GB (inclusive), in multiples of 1 GB (the scheduler will round up to the nearest multiple if needed), by setting mapreduce.map.memory.mb and mapre duce.reduce.memory.mb appropriately.
1. 在基于Yarn的Map Reduce中,每个任务的执行状态或者counter计数器每隔3秒钟汇报给Application Master。所有的任务都会每隔3秒钟将自身的任务执行情况汇报给Application Master,因此Application Master汇总了各个任务的执行情况(可以称为aggregate view of progress of all the tasks)
6. 作业完成
客户端每隔5秒钟(可以通过mapreduce.client.completion.pollinterval)从Application Master处获取作业是否完成的信息,这个是在waitForJobCompletion中完成的。
作业完成后,Application Master以及Task Containers clean up working state。同时Job History Server也获得Job运行信息供历史Job查询