MapReduce二次排序实现

ywg 发表于 2016-12-8 09:29:42

最近在学习使用原生的mapreduce来实现对值的排序。以前使用scalding可以很容易的得到结果。现在靠自己的时候，就非常的犯难呢。参考权威指南里的方法：使用的是自定义的key来实现。
原因是hadoop只支持key上的排序。因此，我们可以自定义一种复合的key，并同时定义这个key的比较方法（重载compareTo方法）。以下是这个key的一种实现：

public class IntPairimplements WritableComparable<IntPair>{
public Text first;
public IntWritable second;
public IntPair(){
super();
this.set(new Text(),new IntWritable());
}
public IntPair(String key, int value){
super();
this.set(key,value);
}
public void set(Text key,IntWritable value){
this.first=key;
this.second=value;
}
public void set(String key,int value){
this.first=new Text(key);
this.second = new IntWritable(value);
}
@Override
public void write(DataOutput out) throws IOException {
this.first.write(out);
this.second.write(out);
}
@Override
public void readFields(DataInput in) throws IOException {
this.first.readFields(in);
this.second.readFields(in);
}
@Override
public int compareTo(IntPair o) {
int value1= this.second.get();
int value2= o.second.get();
if(value1==value2){
return this.first.compareTo(o.first);
}else{
return value1-value2;//升序排列，反过来就是降序排列
}
}
@Override
public boolean equals(Object o) {
if (o instanceof IntPair) {
return this.first.equals(((IntPair) o).first)&&(this.second.equals(((IntPair) o).second));
} else {
return false;
}
}
@Override
public String toString() {
return this.first+"\t"+this.second;
}
@Override
public int hashCode() {
return first.toString().hashCode() + second.toString().hashCode();
}
}

如果在某些情况下，我们希望调用另外的key类比较实现，那么可以额外定义一个comparator，并重载compare方法。代码如下：根据compareTo方法显示，这个IntPair试图实现对second的值先进行升序排列，如果值相同，则进行first值的排序。以上的代码请注意toString方法是用来输出时调用的。否则无法正确打印这个key。在进入map的输出阶段时，会默认调用Key类的compareTo方法进行排序。
注意：
在构造函数的时候使用super(IntPair.class,true)。如果第二个参数为false或者使用super(IntPair.class）则会导致空指针异常。

页: [1]

运维网's Archiver

MapReduce二次排序实现