欢迎光临八叔引擎之家,本站所有资源仅供学习与参考,禁止用于商业用途或从事违法行为!

八叔引擎之家

Hive中自定义Map/Reduce示例 In Java
行业应用 2021-01-21
开发工具:intellij
JDK:jdk1.7
hive:2.3.0
hadoop:2.8.1

一、开发map与reduce

“map类
public class WordCountMap {
    static void main(String args[]) throws Exception{
        new GenericMR().map(System.in,System.out,new Mapper() {
            @Override
            void map(String[] strings,Output output)  Exception {
                for(String str:strings){
                    String[] strs=str.split("\\W+");//如果源文本文件是以\t分隔的,则不需要再拆分,传入的strings就是每行拆分好的单词
                    (String str_2:strs) {
                        output.collect(new String[]{str_2,"1"});
                    }
                }
            }
        });
    }
}
"reduce类
 WordCountReducer {
    new GenericMR().reduce(System.in,1)"> Reducer() {
            @Override
            void reduce(String s,Iterator<String[]> iterator,1)">int sum=0;
                while(iterator.hasNext()){
                    Integer count=Integer.valueOf(iterator.next()[1]);
                    sum+=count;
                }
                output.collect( String[]{s,String.valueOf(sum)});
            }
        });
    }
}

二、导出jar包

 
File->Project Structure
 
 
add Artifacts
 
不用填写Main Class,直接点击OK
 
jar包配置
 
生成jar包
 

三、编写hive sql

drop table if exists raw_lines;

-- create table raw_line,and read all the lines in '/user/inputs',this is the path on your local HDFS
create external table if not exists raw_lines(line string)
ROW FORMAT DELIMITED
stored as textfile
location ;

drop table  exists word_count;

-- create table word_count,this is the output table which will be put /user/outputs' as a text fileif not exists word_count(word string,count int)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY \t
 lines terminated by \n' STORED AS TEXTFILE LOCATION /user/outputs/;


-- add the mapper&reducer scripts as resources,please change your/local/path
--must use "add file",not add jart find map and reduce main class
add file your/local/path/wordcount.jar;

from (
        from raw_lines
        map raw_lines.line
        --call the mapper here
        using java -cp wordcount.jar WordCountMap
        as word,count
        cluster by word) map_output
insert overwrite table word_count
reduce map_output.word,map_output.count
--call the reducer here
using java -cp wordcount.jar WordCountReducer
as word,count;

四、执行hive sql

beeline -u [hiveserver] -n username -f wordcount.hql
本文链接:http://www.viiis.cn/news/show_23573.html

本站采用系统自动发货方式,付款后即出现下载入口,如有疑问请咨询在线客服!

售后时间:早10点 - 晚11:30点

咨询售后客服

服务热线 19970861797
服务热线 19970861797服务热线 19970861797
手机二维码
返回顶部
返回顶部返回顶部