skypaladin 发表于 2017-12-16 23:48:14

Hadoop Windows IDEA

  java jdk1.8都可以了
  
注意jdk的路径要拷贝到一个没有空格的路径改掉JAVA_HOME系统环境变量
  
在etc/hadoop/hadoop_env.cmd里有设置%JAVA_HOME%了不用管,但是不支持带空格的路径,hadoop路径也不能有空格
  
首先:
  
配置输入和输出结果文件夹
  1) 添加和src目录同级的input文件夹到项目中
  在input文件夹中放置一个或多个输入文件源
  
新建一个test.segmented文件
  
内容如下:
  

dfdfadgdgag  
aadads
  
fudflcl
  
cckcer
  
fadf
  
dfdfadgdgag
  
fudflcl
  
fuck
  
fuck
  
fuckfuck
  
haha
  
aaa
  

  2) 配置运行参数

  
在Intellij菜单栏中选择Run->Edit Configurations,在弹出来的对话框中点击+,新建一个Application配置。配置Main>  
Program arguments为input/ output/,即输入路径为刚才创建的input文件夹,输出为output
  
另外我建议改下IDEA maven的镜像不然会很慢
  
修改方法:在~/.m2目录下的settings.xml文件中,(如果该文件不存在,则需要从maven/conf目录下拷贝一份),找到
  

<mirror>  <id>alimaven</id>
  <name>aliyun maven</name>
  <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
  <mirrorOf>central</mirrorOf>      
  
</mirror>
  

  pom.xml:
  

<?xml version="1.0" encoding="UTF-8"?>  
<project xmlns="http://maven.apache.org/POM/4.0.0"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  

  <groupId>dsf</groupId>
  <artifactId>dsff</artifactId>
  <version>1.0-SNAPSHOT</version>
  <repositories>
  <repository>
  <id>apache</id>
  <url>http://maven.apache.org</url>
  </repository>
  </repositories>
  

  <dependencies>
  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-core</artifactId>
  <version>2.7.3</version>
  </dependency>
  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-common</artifactId>
  <version>2.7.3</version>
  </dependency>
  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-yarn-common</artifactId>
  <version>2.7.3</version>
  </dependency>
  <dependency>
  <groupId>org.apache.hadoop</groupId>
  <artifactId>hadoop-mapreduce-client-common</artifactId>
  <version>2.7.3</version>
  </dependency>
  </dependencies>
  <build>
  <plugins>
  <plugin>
  <artifactId>maven-dependency-plugin</artifactId>
  <configuration>
  <excludeTransitive>false</excludeTransitive>
  <stripVersion>true</stripVersion>
  <outputDirectory>./lib</outputDirectory>
  </configuration>
  </plugin>
  </plugins>
  </build>
  
</project>
  

  3) 新建一个WordCount.java文件
  
内容如下:
  

import java.io.IOException;  
import java.util.StringTokenizer;
  

  
import org.apache.hadoop.conf.Configuration;
  
import org.apache.hadoop.fs.Path;
  
import org.apache.hadoop.io.IntWritable;
  
import org.apache.hadoop.io.Text;
  
import org.apache.hadoop.mapreduce.Job;
  
import org.apache.hadoop.mapreduce.Mapper;
  
import org.apache.hadoop.mapreduce.Reducer;
  
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
  
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
  


  
public>  


  public static>  extends Mapper<Object, Text, Text, IntWritable> {
  

  private final static IntWritable one = new IntWritable(1);
  private Text word = new Text();
  

  public void map(Object key, Text value, Context context
  ) throws IOException, InterruptedException {
  StringTokenizer itr = new StringTokenizer(value.toString());
  while (itr.hasMoreTokens()) {
  word.set(itr.nextToken());
  context.write(word, one);
  }
  }
  }
  


  public static>  extends Reducer<Text, IntWritable, Text, IntWritable> {
  private IntWritable result = new IntWritable();
  

  public void reduce(Text key, Iterable<IntWritable> values,
  Context context
  ) throws IOException, InterruptedException {
  int sum = 0;
  for (IntWritable val : values) {
  sum += val.get();
  }
  result.set(sum);
  context.write(key, result);
  }
  }
  

  public static void main(String[] args) throws Exception {
  Configuration conf = new Configuration();
  Job job = Job.getInstance(conf, "word count");
  job.setJarByClass(WordCount.class);
  job.setMapperClass(TokenizerMapper.class);
  job.setCombinerClass(IntSumReducer.class);
  job.setReducerClass(IntSumReducer.class);
  job.setOutputKeyClass(Text.class);
  job.setOutputValueClass(IntWritable.class);
  FileInputFormat.addInputPath(job, new Path(args));
  FileOutputFormat.setOutputPath(job, new Path(args));
  System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
  
}
  

  4) 然后要用windows可以看下教程:
  
这一定是可以的也是最方便的了
  
https://www.cs.helsinki.fi/u/jilu/paper/hadoop_on_win.pdf
  
首先下载hadoop二进制程序
  
2.7.3 source:
  
http://hadoop.apache.org/releases.html
  
替换掉hadoop-2.7.3里的bin文件
  
替换程序:
  
https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
  
设置环境:
  
这里注意下最好是在IDEA里设置HADOOP_HOME环境变量,如果设置的是系统环境变量那么你就还需要修改,我都设置了。。。
  
hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\util\Shell.java
  
源码:
  
2.7.3 source:
  
http://hadoop.apache.org/releases.html
  

private static String checkHadoopHome() {  

  // first check the Dflag hadoop.home.dir with JVM scope
  //System.setProperty("hadoop.home.dir", "...");
  String home = System.getProperty("hadoop.home.dir");
  

  // fall back to the system/user-global env variable
  if (home == null) {
  home = System.getenv("HADOOP_HOME");
  }
  

  try {
  // couldn't find either setting for hadoop's home directory
  if (home == null) {
  throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");
  }
  

  if (home.startsWith("\"") && home.endsWith("\"")) {
  home = home.substring(1, home.length()-1);
  }
  

  // check that the home setting is actually a directory that exists
  File homedir = new File(home);
  if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory()) {
  throw new IOException("Hadoop home directory " + homedir
  + " does not exist, is not a directory, or is not an absolute path.");
  }
  

  home = homedir.getCanonicalPath();
  

  } catch (IOException ioe) {
  if (LOG.isDebugEnabled()) {
  LOG.debug("Failed to detect a valid hadoop home directory", ioe);
  }
  home = null;
  }
  //固定本机的hadoop地址
  home="D:\\hadoop-2.7.3";
  return home;
  }
  

home = System.getenv("HADOOP_HOME");  

  如果是设置系统环境变量,这里获取的HADOOP_HOME的home目录的字符串会在字符串开始加入一个'\u202A'字符,(好像是)代表c/c++/java源码(神奇)
  
然后把文件拷贝到你的工程下,idea会优先查找工程目录下的(可以先拷过来再改)
  
如果是设置IDEA里的环境变量就不用改Shell.java了
  
之后按照文档里的改下NativeIO.java文件
  
hadoop-2.7.3-src\hadoop-common-project\hadoop-common\src\main\java\org\apache\hadoop\io\nativeio\NativeIO.java
  
修改行609左右
  

    return access0(path, desiredAccess.accessRight());  

  改成
  

    return true;   

  也拷到工程下来就行了
  另外建议以管理员模式打开IDEA
  由于Hadoop的设定,下次运行时务必删除output文件夹!
  好了,运行程序,结果如下:
  aaa 1
  
aadads 1
  
cckcer 1
  
dfdfadgdgag 2
  
fadf 1
  
fuck 2
  
fuckfuck 1
  
fudflcl 2
  
haha 1
页: [1]
查看完整版本: Hadoop Windows IDEA