Spring for Apache Hadoop

352262 发表于 2018-11-1 06:58:41

　　简介：Spring for Apache Hadoop providesintegration with the Spring Framework to create and run Hadoop MapReduce, Hive,and Pig jobs as well as work with HDFS and HBase. If you have simple needs towork with Hadoop, including basic scheduling, you can add the Spring for ApacheHadoop namespace to your Spring based project and get going quickly usingHadoop. As the complexity of your Hadoop application increases, you may want touse Spring Batch and Spring Integration to regin in the complexity of developinga large Hadoop application.
　　1.Usingthe Spring for Apache Hadoop Namespace
　　To use the SHDP namespace, one just needs to importit inside the configuration:
　　

      　　

　　

　　Spring for Apache Hadoop namespace prefix. Any namecan do but through out the reference documentation, the
　　
hdp
　　
will beused.

　　The namespace URI.

　　The namespace URI location. Note that even thoughthe location points to an external address (which exists and is valid), Springwill resolve the schema locally as it is included in the Spring for ApacheHadoop library.

　　Declaration example for the Hadoop namespace. Noticethe prefix usage.
　　
2. Configuring Hadoop
　　In order to use Hadoop, one needs to first configure it namely by creatinga Configuration object. The configuration holds information about the jobtracker, the input, output format and the various other parameters of the mapreduce job.
　　

　　

　　

使用xml配置文件：　　

　　

　　

　　

使用properties配置文件：　　

                        fs.default.name=hdfs://localhost:9000       hadoop.tmp.dir=/tmp/hadoop       electric=sea       　　

　　

使用表达式避免硬编码（properties文件）：　　

　　

                        fs.default.name=${hd.fs}       hadoop.tmp.dir=file://${java.io.tmpdir}       hangar=${number:18}                         　　

　　
3.Creating a Hadoop Job
　　example：
　　未指定Configuration，会默认使用hadoopConfiguration
　　

　　

　　

Notice that there is no reference to the Hadoop configuration above - that's because, if not specified, the default naming convention (hadoopConfiguration) will be used instead.　　

指定Configuration配置文件：　　

　　

　　

4.Creating a Hadoop Streaming Job　　

　　

　　

　　

5.Running a Hadoop Job　　

for basic job submission SHDP provides the job-runner element (backed by JobRunner>　　

　　

　　

　　

Multiple jobs can be specified and even nested if they are not used outside the runner:　　

　　

Do note that the runner will not run unless triggered manually or if run-at-startup is set to true　　

　　

6.Using the Hadoop Job tasklet　　

　　

　　

wait-for-job is true so that the tasklet will wait for the job to complete when it executes　　

7.Running a Hadoop Tool　　

hadoop jar带参运行：　　

　　
bin/hadoop jar -conf hadoop-site.xml -jt darwin:50020 -D property=value someJar.jar org.foo.SomeTool data/in.txt data/out.txt
　　

　　

　　
spring for hadoop 使用tool-runner运行带参的mapreduce
　　

　　

　　

　　

页: [1]

运维网's Archiver

Spring for Apache Hadoop