配置hadoop 使用fair scheduler调度器

chriszg 发表于 2016-12-8 07:36:10

hadoop版本为cloudera hadoop cdh3u3
配置步骤为
1. 将$HADOOP_HOME/contrib/fairscheduler/hadoop-fairscheduler-0.20.2-cdh3u3.jar拷贝到$HADOOP_HOME/lib文件夹中
2. 修改$HADOOP_HOME/conf/mapred-site.xml配置文件

viewplaincopy

[*]<property>
[*]<name>mapred.jobtracker.taskScheduler</name>
[*]<value>org.apache.hadoop.mapred.FairScheduler</value>
[*]</property>
[*]<property>
[*]<name>mapred.fairscheduler.allocation.file</name>
[*]<value>/home/hadoop/hadoop-0.20.2-cdh3u3/conf/fair-scheduler.xml</value>
[*]</property>
[*]
[*]<property>
[*]<name>mapred.fairscheduler.preemption</name>
[*]<value>true</value>
[*]</property>
[*]
[*]<property>
[*]<name>mapred.fairscheduler.assignmultiple</name>
[*]<value>true</value>
[*]</property>
[*]
[*]<property>
[*]<name>mapred.fairscheduler.poolnameproperty</name>
[*]<value>mapred.job.queue.name</value>
[*]<description>job.set("mapred.job.queue.name",pool);</description>
[*]</property>
[*]
[*]<property>
[*]<name>mapred.fairscheduler.preemption.only.log</name>
[*]<value>true</value>
[*]</property>
[*]
[*]<property>
[*]<name>mapred.fairscheduler.preemption.interval</name>
[*]<value>15000</value>
[*]</property>
[*]
[*]<property>
[*]<name>mapred.queue.names</name>
[*]<value>default,hadoop,hive</value>
[*]</property>

3. 在$HADOOP_HOME/conf/新建配置文件fair-scheduler.xml

viewplaincopy

[*]<?xmlversion="1.0"?>
[*]<allocations>
[*]<poolname="hive">
[*]<minMaps>90</minMaps>
[*]<minReduces>20</minReduces>
[*]<maxRunningJobs>20</maxRunningJobs>
[*]<weight>2.0</weight>
[*]<minSharePreemptionTimeout>30</minSharePreemptionTimeout>
[*]</pool>
[*]
[*]<poolname="hadoop">
[*]<minMaps>9</minMaps>
[*]<minReduces>2</minReduces>
[*]<maxRunningJobs>20</maxRunningJobs>
[*]<weight>1.0</weight>
[*]<minSharePreemptionTimeout>30</minSharePreemptionTimeout>
[*]</pool>
[*]
[*]<username="hadoop">
[*]<maxRunningJobs>6</maxRunningJobs>
[*]</user>
[*]<poolMaxJobsDefault>10</poolMaxJobsDefault>
[*]<userMaxJobsDefault>8</userMaxJobsDefault>
[*]<defaultMinSharePreemptionTimeout>600</defaultMinSharePreemptionTimeout>
[*]<fairSharePreemptionTimeout>600</fairSharePreemptionTimeout>
[*]</allocations>

4. 在集群的各个节点执行以上步骤，然后重启集群，在http://namenode:50030/scheduler 即可查看到调度器运行状态，如果修改调度器配置的话，只需要修改文件fair-scheduler.xml ，不需重启配置即可生效。
5. 在执行hive任务时，设置hive属于的队列set mapred.job.queue.name=hive;
##########
另外，如果在执行MR JOB的时候出现XX用户访问不了YY队列的话，就需要在mapred-queue-acls.xml里配置相应的属性，来对访问权限进行控制，比如：

viewplaincopy

[*]<property>
[*]<name>mapred.queue.default.acl-submit-job</name>
[*]<value>*</value>
[*]<description>Commaseparatedlistofuserandgroupnamesthatareallowed
[*]tosubmitjobstothe'default'queue.Theuserlistandthegrouplist
[*]areseparatedbyablank.Fore.g.user1,user2group1,group2.
[*]Ifsettothespecialvalue'*',itmeansallusersareallowedto
[*]submitjobs.Ifsetto''(i.e.space),nouserwillbeallowedtosubmit
[*]jobs.
[*]
[*]ItisonlyusedifauthorizationisenabledinMap/Reducebysettingthe
[*]configurationpropertymapred.acls.enabledtotrue.
[*]
[*]IrrespectiveofthisACLconfiguration,theuserwhostartedtheclusterand
[*]clusteradministratorsconfiguredvia
[*]mapreduce.cluster.administratorscansubmitjobs.
[*]</description>
[*]</property>
[*]
[*]<property>
[*]<name>mapred.queue.default.acl-administer-jobs</name>
[*]<value>*</value>
[*]<description>Commaseparatedlistofuserandgroupnamesthatareallowed
[*]toviewjobdetails,killjobsormodifyjob'spriorityforallthejobs
[*]inthe'default'queue.Theuserlistandthegrouplist
[*]areseparatedbyablank.Fore.g.user1,user2group1,group2.
[*]Ifsettothespecialvalue'*',itmeansallusersareallowedtodo
[*]thisoperation.Ifsetto''(i.e.space),nouserwillbeallowedtodo
[*]thisoperation.
[*]
[*]ItisonlyusedifauthorizationisenabledinMap/Reducebysettingthe
[*]configurationpropertymapred.acls.enabledtotrue.
[*]
[*]IrrespectiveofthisACLconfiguration,theuserwhostartedtheclusterand
[*]clusteradministratorsconfiguredvia
[*]mapreduce.cluster.administratorscandotheaboveoperationsonallthejobs
[*]inallthequeues.Thejobownercandoalltheaboveoperationsonhis/her
[*]jobirrespectiveofthisACLconfiguration.
[*]</description>
[*]</property>

页: [1]

运维网's Archiver

配置hadoop 使用fair scheduler调度器