欢迎光临八叔引擎之家,本站所有资源仅供学习与参考,禁止用于商业用途或从事违法行为!

八叔引擎之家

Hive中自定义Map/Reduce示例 In Python
行业应用 2021-01-21
开发环境:
python:2.7.5
hive:2.3.0
hadoop:2.8.1

一、map与reduce脚本

#!/usr/bin/python
import sys 
 re
while True:
   line = sys.stdin.readline().strip()
   if not line:
     break
   p = re.compile(r'\W+')
   words=p.split(line)
   write the tuples to stdout
   for word in words:
     print %s\t%s' % (word,"1")
 sys 

 maps words to their counts
word2count = {}

 True:
    line=sys.stdin.readline().strip()
     line:
      break
     parse the input we got from mapper.py
    try:
        word,count= line.split(\t',1)
    except:
        continue

     convert count (currently a string) to int
    :
        count = int(filter(str.isdigit,count))
     ValueError:
        :
        word2count[word] = word2count[word]+count
    :
        word2count[word] = count

 write the tuples to stdout
# Note: they are unsorted
 word2count.keys():
    ' % ( word,word2count[word] )

二、编写hive hql

drop table if exists raw_lines;

-- create table raw_line,and read all the lines in /user/inputs,this is the path on your local HDFS
create external table if not exists raw_lines(line string)
ROW FORMAT DELIMITED
stored as textfile
location ;

drop table  exists word_count;

-- create table word_count,this is the output table which will be put /user/outputs' as a text fileif not exists word_count(word string,count int)
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY 
 lines terminated by \n' STORED AS TEXTFILE LOCATION /user/outputs/;

-- add the mapper&reducer scripts as resources,please change your/local/path
add file /home/yanggy/mapper.py;
add reducer.py;

from (
        from raw_lines
        map raw_lines.line
        --call the mapper here
        using mapper.py
        as word,count
        cluster by word) map_output
insert overwrite table word_count
reduce map_output.word,map_output.count
--call the reducer here
using reducer.py
as word,count;
本文链接:http://www.viiis.cn/news/show_23578.html

本站采用系统自动发货方式,付款后即出现下载入口,如有疑问请咨询在线客服!

售后时间:早10点 - 晚11:30点

咨询售后客服

服务热线 19970861797
服务热线 19970861797服务热线 19970861797
手机二维码
返回顶部
返回顶部返回顶部