首页 > 其他 > 详细

Hive常用函数

时间:2020-01-08 16:35:04      阅读:141      评论:0      收藏:0      [点我收藏+]

 

Hive常用函数大全一览

 

hive中split、coalesce及collect_list函数的用法(可举例)
Split将字符串转化为数组。
split(‘a,b,c,d‘ , ‘,‘) ==> ["a","b","c","d"]
COALESCE(T v1, T v2, …) 返回参数中的第一个非空值;如果所有值都为 NULL,那么返回NULL。
collect_list列出该字段所有的值,不去重  select collect_list(id) from table;

 

常用时间函数:

将mongodb中时区转换过来: 由UTC时区 转换为 GMT时区差8个小时

date_format(from_utc_timestamp( CONCAT_WS( ,substring(updatetime,1,10),substring(updatetime,12,8) ) ,GMT+8),yyyy-MM-dd HH:mm:ss) updatetime

select date_format(from_utc_timestamp(create_time,"UTC"),yyyy-MM-dd HH:mm:ss) as local_time

select date_format(from_utc_timestamp(create_time,"GMT+8"),yyyy-MM-dd HH:mm:ss) as local_time

 

 

今天日期: date_format(current_timestamp(), ‘yyyy-MM-dd‘) 

 

 

时间戳 秒S是10位;
毫秒ms是13位;
1541562738 updatetime(string)
date_format(from_unixtime(cast(h.updatetime as int)),‘yyyy-MM-dd HH:mm:ss‘)

substring(h.id, 10, 24)
59409d1d2cdcc90b91c62be5 ObjectId(59409d1d2cdcc90b91c62be5)

select date_format(current_timestamp,‘yyyy-MM-dd‘)
2019-08-19
select date_sub(current_date,1);
2019-08-18

 

Hive中字段的合并

contact:简单合并功能;

CONCAT_WS("/",r.province,r.city,a.area) channel_address  => 北京/北京市/朝阳区 ,字段必须是string;

concat(payid, ,carlogid)  => 01a893092b914703b75941b713767ebf 408693

 

concat(substr(summary_time,9,2),‘‘,substr(summary_time,6,2),‘‘,substr(summary_time,1,4),‘_‘,concat(market_id),‘_‘ ,concat(mid))
09022019_108_0
12022019_108_0
21022019_108_0

concat_ws("_" ,substr(summary_time,9,2),substr(summary_time,6,2),substr(summary_time,1,4),concat(market_id),concat(mid))
09_03_2019_108_0
13_03_2019_108_0
21_03_2019_108_0

concat(sum(total_amount_pur),‘&&‘,sum(total_amount_sig),‘&&‘,sum(total_amount_jump))
0.0&&16665.0&&0.0

 

 

 

order by date desc, market_id asc;
date    market_id    total_fee
2019-10-08    110    23000
2019-10-08    110    13000
2019-10-08    141    1400
2019-10-08    141    2250
2019-10-08    218    4000
2019-10-08    218    1100
2019-10-08    218    2300
2019-10-08    218    4500
2019-10-08    234    0
2019-10-08    234    0

查询仅出现一次的数据

SELECT name,count(name) AS count_times  FROM tb_test  GROUP BY name  HAVING count_times = 1;

查询语句返回某字段出现超过1次的所有记录

select * from stu where sname in (select sname from stu group by sname having count(sname)>1);
有重复的sname的记录,并计算相同sname的数量
select *,count(sname)as count from stu group by sname having (count(sname)>1);

 

Hive取非Group by字段数据的方法,但可加聚合函数如count、sum、avg、max等

  方法①  

  输入数据为一个ID对应多个name,要求输出数据为ID是唯一的,name随便取一个就可以。

   HIVE有这么一个函数collect_set,类似于mysql的group_concat函数,把每个分组的其他字段,按照逗号进行拼接,得到一个最终字符串:

collect_set(col)
返回类型:array
解释:返回一个去重后的对象集合

    

select sid,collect_set(class_id) from table1 group by sid;
===>>
1 [11,12,13]
2 [11,14]
3 [12,15]
可以针对第二列做一些计数、求和操作,分别对应到Hive的聚合函数count、sum。
对应到本文的目的,直接从数组获取第一个元素就达到目的了,这样做:
select sid,collect_set(class_id)[0] from table1 group by sid;

方法②  select后需要几个字段就gruop by几个

select sid,class_id from table1 group by sid, class_id;

 

 相同的user_id,但是每个create_time不一样,现在的需求是根据create_time创建时间选取最早的,那么思路是现根据user_id进行分组,然后根据user_id,create_time进行排序,取row_number 为1的值

SELECT * FROM(SELECT *, row_number() OVER(PARTITION BY dt,id ORDER BY dt,id DESC) rk FROM test_table)t 
WHERE t.rk = 1 

 

 

Hive查询中数值累加

1. 需求分析

现有 hive 表 record, 内容如下:

其中字段意义: channel_type(string)   dt(string)   num(int); 分别代表: 渠道类型 日期  该天个数,原数据模拟如下:

技术分享图片
select * from record;
channel_type  dt     num
A    2015-01-01    8
A    2015-01-02    4
A    2015-01-02    5
C    2015-02-01    1
A    2015-01-04    5
A    2015-01-05    6
B    2015-01-03    2
B    2015-01-02    3
A    2015-01-03    2
C    2015-01-30    8
C    2015-01-30    7
B    2015-01-02    9
B    2015-01-01    1
C    2015-02-02    3
View Code

统计每个渠道截止到当天为止的最大单日人数和累计到该天的总人数:

# 先求出每个渠道每天总访问量:
create table record_nj as  
select
channel_type,dt,sum(num) as new_join
from record
group by channel_type,dt;
技术分享图片
channel_type    dt    new_join
A    2015-01-01    8
A    2015-01-02    9
A    2015-01-03    2
A    2015-01-04    5
A    2015-01-05    6
B    2015-01-01    1
B    2015-01-02    12
B    2015-01-03    2
C    2015-01-30    15
C    2015-02-01    1
C    2015-02-02    3
View Code
方法一: 使用Hive窗口函数over  max()、sum()

select 
    channel_type, dt, new_join, 
    sum(new_join) over(partition by channel_type order by dt) as sum_count,
    max(new_join) over(partition by channel_type order by dt) as max_count
from record_nj;

方法二:使用group by  join自连接
select 
    t1.channel_type,t1.dt,t1.new_join, 
    sum(t2.new_join) sum_count, max(t2.new_join) max_count 
from record_nj t1 join record_nj t2 on t1.channel_type = t2.channel_type 
where t1.dt >= t2.dt 
group by t1.channel_type,t1.dt,t1.new_join order by t1.channel_type,t1.dt;

数据结果如下:
技术分享图片
channel_type    dt    new_join    sum_count    max_count
A    2015-01-01    8    8    8
A    2015-01-02    9    17    9
A    2015-01-03    2    19    9
A    2015-01-04    5    24    9
A    2015-01-05    6    30    9
B    2015-01-01    1    1    1
B    2015-01-02    12    13    12
B    2015-01-03    2    15    12
C    2015-01-30    15    15    15
C    2015-02-01    1    16    15
C    2015-02-02    3    19    15
View Code

 累加、累乘、最大值:

 

select 
    channel_type,
    new_join, 
    sum(new_join) over(partition by channel_type order by dt) as sum_count,--累加
    sum(new_join) over(partition by channel_type order by dt rows between unbounded preceding and current row) sum_count, --累加
    round(power(10, sum(log(10, new_join))over(partition by channel_type order by dt rows between unbounded preceding and current row))) as tired,--累乘处理-
    max(new_join) over(partition by channel_type order by dt) as max_count --最大值
from record_nj;

 

 

行列转换

技术分享图片

原数据如下:

技术分享图片
select * from score;
name    subject    score
孙悟空    语文    87
孙悟空    数学    95
孙悟空    英语    68
大海    语文    94
大海    数学    56
大海    英语    84
kris    语文    64
kris    数学    86
kris    英语    84
婷婷    语文    65
婷婷    数学    85
婷婷    英语    78
View Code

 

求语文成绩比数学成绩好的学生:

 

方法一:join 
select
s1.name,s1.subject, s1.score from score s1 inner join score s2 on s1.name = s2.name where s1.score > s2.score and s1.subject = 语文 and s2.subject = 数学;

 

技术分享图片
s1.name    s1.subject    s1.score
大海    语文    94
View Code
方法二:行列转换create table t1 AS
select 
 name,
 case subject when 语文 then score else 0 end as chinese_score,
 case subject when 数学 then score else 0 end as math_score 
from score;
技术分享图片
name    chinese_score    math_score
孙悟空    87    0
孙悟空    0    95
孙悟空    0    0
大海            94    0
大海            0    56
大海            0    0
kris           64    0
kris           0    86
kris           0    0
婷婷          65    0
婷婷          0       85
婷婷          0        0
View Code
create table t2 AS
select
name,max(chinese_score) chinese_score,max(math_score) math_score
from t1 group by name;
技术分享图片
name    chinese_score    math_score
kris    64    86
大海    94    56
婷婷    65    85
孙悟空    87    95
View Code
select 
  name, chinese_score, math_score
from t2 where chinese_score > math_score;
或者三个hql合并为一个如下
====> select name,chinese_score,math_score from(
select name,max(chinese_score) chinese_score,max(math_score) math_score from( select name, case subject when 语文 then score else 0 end as chinese_score, case subject when 数学 then score else 0 end as math_score from score)t1
group by
t1.name
)t2
where chinese_score >= math_score ;
技术分享图片
name    chinese_score    math_score
大海    94    56
View Code

列转行的实现:

数据如下:

技术分享图片
id  sname   math    computer    english
1   Jed     34      58          58
2   Tony    45      87          45
3   Tom     76      34          89
View Code
select id, sname, math as course, math as score from score
union 
select id, sname, computer as course, computer as score from score
union 
select id, sname, english as course, english as score from score
order by id, sname, course;

结果如下:

技术分享图片
id  sname   course      score
1   Jed     computer    58
1   Jed     english     58
1   Jed     math        34
2   Tony    computer    87
2   Tony    english     45
2   Tony    math        45
3   Tom     computer    34
3   Tom     english     89
3   Tom     math        76
View Code
 
 
 
 
 
 
 
 
 
 
 
 

Hive常用函数

原文:https://www.cnblogs.com/shengyang17/p/11204631.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!