Hive的常用函数

时间：2021-05-13 00:59:02 阅读：29 评论：0 收藏：0 [点我收藏+]

1 系统内置函数

1．查看系统自带的函数
hive> show functions;
2．显示自带的函数的用法
hive> desc function upper;
3．详细显示自带的函数的用法
hive> desc function extended upper;

2 数值计算

1、取整函数: round

语法: round(double a)
返回值: BIGINT
说明: 返回double类型的整数值部分（遵循四舍五入）

hive> select round(3.1415926) from tableName;
3
hive> select round(3.5) from tableName;
4
hive> create table tableName as select round(9542.158) from tableName;

2、指定精度取整函数: round

语法: round(double a, int d)
返回值: DOUBLE
说明: 返回指定精度d的double类型

hive> select round(3.1415926, 4) from tableName;
3.1416

3、向下取整函数: floor

语法: floor(double a)
返回值: BIGINT
说明: 返回等于或者小于该double变量的最大的整数

hive> select floor(3.1415926) from tableName;
3
hive> select floor(25) from tableName;
25

4、向上取整函数: ceil

语法: ceil(double a)
返回值: BIGINT
说明: 返回等于或者大于该double变量的最小的整数

hive> select ceil(3.1415926) from tableName;
4
hive> select ceil(46) from tableName;
46

5、向上取整函数: ceiling

语法: ceiling(double a)
返回值: BIGINT
说明: 与ceil功能相同

hive> select ceiling(3.1415926) from tableName;
4
hive> select ceiling(46) from tableName;
46

6、取随机数函数: rand

语法: rand(), rand(int seed)
返回值: double
说明: 返回一个0到1范围内的随机数。如果指定种子seed，则会等到一个稳定的随机数序列

hive> select rand() from tableName;
0.5577432776034763
hive> select rand() from tableName;
0.6638336467363424
hive> select rand(100) from tableName;
0.7220096548596434
hive> select rand(100) from tableName;
0.7220096548596434

3 日期函数

1、UNIX时间戳转日期函数: from_unixtime

语法: from_unixtime(bigint unixtime[, string format])
返回值: string
说明: 转化UNIX时间戳（从1970-01-01 00:00:00 UTC到指定时间的秒数）到当前时区的时间格式

hive> select from_unixtime(1323308943, ‘yyyyMMdd‘);
20111208

2、获取当前UNIX时间戳函数: unix_timestamp

语法: unix_timestamp()
返回值: bigint
说明: 获得当前时区的UNIX时间戳

hive> select unix_timestamp() ;
1323309615

3、日期转UNIX时间戳函数: unix_timestamp

语法: unix_timestamp(string date)
返回值: bigint
说明: 转换格式为"yyyy-MM-dd HH:mm:ss"的日期到UNIX时间戳。如果转化失败，则返回0。

hive> select unix_timestamp(‘2021-4-30 13:01:03‘) ;
1323234063

4、指定格式日期转UNIX时间戳函数: unix_timestamp

语法: unix_timestamp(string date, string pattern)
返回值: bigint
说明: 转换pattern格式的日期到UNIX时间戳。如果转化失败，则返回0。

hive> select unix_timestamp(‘20111207 13:01:03‘,‘yyyyMMdd HH:mm:ss‘);
1323234063

5、日期时间转日期函数: to_date

语法: to_date(string datetime)
返回值: string
说明: 返回日期时间字段中的日期部分。

hive> select to_date(‘2011-12-08 10:03:01‘);
2011-12-08

6、日期转年函数: year

语法: year(string date)
返回值: int
说明: 返回日期中的年。

hive> select year(‘2011-12-08 10:03:01‘) ;
2011
hive> select year(‘2021-12-08‘);
2012

7、日期转月函数: month

语法: month (string date)
返回值: int
说明: 返回date或datetime中的月份。

hive> select month(‘2011-12-08 10:03:01‘) ;
12
hive> select month(‘2011-08-08‘);
8

8、日期转天函数: day

语法: day (string date)
返回值: int
说明: 返回日期中的天。

hive> select day(‘2011-12-08 10:03:01‘) ;
8
hive> select day(‘2011-12-24‘);
24

9、日期转小时函数: hour

语法: hour (string date)
返回值: int
说明: 返回日期中的小时。

hive> select hour(‘2011-12-08 10:03:01‘) ;
10

10、日期转分钟函数: minute

语法: minute (string date)
返回值: int
说明: 返回日期中的分钟。

hive> select minute(‘2011-12-08 10:03:01‘) ;
3

-- second 返回秒
hive> select second(‘2011-12-08 10:03:01‘) ;
1

12、日期转周函数: weekofyear

语法: weekofyear (string date)
返回值: int
说明: 返回日期在当前的周数。

hive> select weekofyear(‘2021-04-08 10:03:01‘) ;
49

13、日期比较函数: datediff

语法: datediff(string enddate, string startdate)
返回值: int
说明: 返回结束日期减去开始日期的天数。

hive> select datediff(‘2012-12-08‘,‘2012-05-09‘) ;
213

14、日期增加函数: date_add

语法: date_add(string startdate, int days)
返回值: string
说明: 返回开始日期startdate增加days天后的日期。

hive> select date_add(‘2012-12-08‘,10) ;
2012-12-18

15、日期减少函数: date_sub

语法: date_sub (string startdate, int days)
返回值: string
说明: 返回开始日期startdate减少days天后的日期。

hive> select date_sub(‘2012-12-18‘,10) ;
2012-11-28

4 条件函数

1、If函数: if

语法: if(boolean testCondition, T valueTrue, T valueFalseOrNull)
返回值: T
说明: 当条件testCondition为TRUE时，返回valueTrue；否则返回valueFalseOrNull

hive> select if(1=2,100,200);
200
hive> select if(1=1,100,200);
100

2、非空查找函数: COALESCE

语法: COALESCE(T v1, T v2, …)
返回值: T
说明: 返回参数中的第一个非空值；如果所有值都为NULL，那么返回NULL

hive> select COALESCE(null,‘100‘,‘50‘) ;
100

3、条件判断函数：CASE

语法: CASE a WHEN b THEN c [WHEN d THEN e]* [ELSE f] END
返回值: T
说明：如果a等于b，那么返回c；如果a等于d，那么返回e；否则返回f

hive> select case 100 when 50 then ‘tom‘ when 100 then ‘mary‘ else ‘tim‘ end ;
mary
hive> Select case 200 when 50 then ‘tom‘ when 100 then ‘mary‘ else ‘tim‘ end ;
tim

4、条件判断函数：CASE

语法: CASE WHEN a THEN b [WHEN c THEN d]* [ELSE e] END
返回值: T
说明：如果a为TRUE,则返回b；如果c为TRUE，则返回d；否则返回e

hive> select case when 1=2 then ‘tom‘ when 2=2 then ‘mary‘ else ‘tim‘ end ;
mary
hive> select case when 1=1 then ‘tom‘ when 2=2 then ‘mary‘ else ‘tim‘ end;
tom

5 字符串函数

1、字符串长度函数：length

语法: length(string A)
返回值: int
说明：返回字符串A的长度

hive> select length(‘abcedfg‘) ;

2、字符串反转函数：reverse

语法: reverse(string A)
返回值: string
说明：返回字符串A的反转结果

hive> select reverse(‘abcdefg‘) ;
gfdecba

3、字符串连接函数：concat

语法: concat(string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，支持任意个输入字符串

hive> select concat(‘abc‘,‘def‘,‘gh‘);
abcdefgh

4、字符串连接并指定字符串分隔符：concat_ws

语法: concat_ws(string SEP, string A, string B…)
返回值: string
说明：返回输入字符串连接后的结果，SEP表示各个字符串间的分隔符

hive> select concat_ws(‘,‘,‘abc‘,‘def‘,‘gh‘) ;
abc,def,gh

5、字符串截取函数：substr

语法: substr(string A, int start), substring(string A, int start)
返回值: string
说明：返回字符串A从start位置到结尾的字符串

hive> select substr(‘abcdeere‘,3) ;
cde
hive> select substring(‘abcde‘,3);
cde
hive> select substr(‘abcde‘,-2) ;  （负数从后往前）
e

6、字符串截取函数：substr, substring

语法: substr(string A, int start, int len),substring(string A, int start, int len)
返回值: string
说明：返回字符串A从start位置开始，长度为len的字符串

hive> select substr(‘abcde‘,3,2) ;
cd
hive> select substring(‘abcde‘,3,2) ;
cd
hive>select substring(‘abcde‘,-3,2) ;
cd

7、字符串转大写函数：upper, ucase

语法: upper(string A) ucase(string A)
返回值: string
说明：返回字符串A的大写格式

hive> select upper(‘abSEd‘);
ABSED
hive> select ucase(‘abSEd‘);
ABSED

8、字符串转小写函数：lower, lcase

语法: lower(string A) lcase(string A)
返回值: string
说明：返回字符串A的小写格式

hive> select lower(‘abSEd‘) ;
absed
hive> select lcase(‘abSEd‘);
absed

9、去空格函数：trim

语法: trim(string A)
返回值: string
说明：去除字符串两边的空格

hive> select trim(‘ ab c ‘);
ab c

10、url解析函数 parse_url

语法:
parse_url(string urlString, string partToExtract [, string keyToExtract])
返回值: string
说明：返回URL中指定的部分。partToExtract的有效值为：HOST, PATH,
QUERY, REF, PROTOCOL, AUTHORITY, FILE, and USERINFO.

hive> select parse_url
(‘https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1‘, ‘HOST‘) 
;
www.tableName.com 
hive> select parse_url
(‘https://www.tableName.com/path1/p.php?k1=v1&k2=v2#Ref1‘, ‘QUERY‘, ‘k1‘)
 ;
v1

11、json解析 get_json_object

语法: get_json_object(string json_string, string path)
返回值: string
说明：解析json的字符串json_string,返回path指定的内容。如果输入的json字符串无效，那么返回NULL。

hive> select  get_json_object(‘{"store":{"fruit":\[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}], "bicycle":{"price":19.95,"color":"red"} },"email":"amy@only_for_json_udf_test.net","owner":"amy"}‘,‘$.owner‘) ;

技术分享图片

12、重复字符串函数：repeat

语法: repeat(string str, int n)
返回值: string
说明：返回重复n次后的str字符串

hive> select repeat(‘abc‘, 5) ;
abcabcabcabcabc

13、分割字符串函数: split

语法: split(string str, string pat)
返回值: array
说明: 按照pat字符串分割str，会返回分割后的字符串数组

hive> select split(‘abtcdtef‘,‘t‘);
["ab","cd","ef"]

6 集合统计函数

1、个数统计函数: count

语法: count(*), count(expr), count(DISTINCT expr[, expr_.])
返回值：Int
说明: count(*)统计检索出的行的个数，包括NULL值的行；count(expr)返回指定字段的非空值的个数；count(DISTINCT
expr[, expr_.])返回指定字段的不同的非空值的个数

hive> select count(*) from tableName;
20
hive> select count(distinct t) from tableName;
10

2、总和统计函数: sum

语法: sum(col), sum(DISTINCT col)
返回值: double
说明: sum(col)统计结果集中col的相加的结果；sum(DISTINCT col)统计结果中col不同值相加的结果

hive> select sum(t) from tableName;
100
hive> select sum(distinct t) from tableName;
70

3、平均值统计函数: avg

语法: avg(col), avg(DISTINCT col)
返回值: double
说明: avg(col)统计结果集中col的平均值；avg(DISTINCT col)统计结果中col不同值相加的平均值

hive> select avg(t) from tableName;
50
hive> select avg (distinct t) from tableName;
30

4、最小值统计函数: min

语法: min(col)
返回值: double
说明: 统计结果集中col字段的最小值

hive> select min(t) from tableName;
20

5、最大值统计函数: max

语法: max(col)
返回值: double
说明: 统计结果集中col字段的最大值

hive> select max(t) from tableName;
120

7 复合类型构建函数

1、Map类型构建: map

语法: map (key1, value1, key2, value2, …)
说明：根据输入的key和value对构建map类型

-- 建表
create table score_map(name string, score map<string, int>)
row format delimited fields terminated by ‘\t‘ 
collection items terminated by ‘,‘ 
map keys terminated by ‘:‘;

-- 创建数据内容如下并加载数据
cd /bigdata/logs/hivedatas/
vi score_map.txt

zhangsan	sx:80,yw:89,zz:95
lisi	sx:60,yw:80,zz:99

-- 加载数据到hive表当中去
load data local inpath ‘/bigdata/logs/hivedatas/score_map.txt‘ overwrite into table score_map;

-- map结构数据访问：
-- 获取所有的value：
select name,map_values(score) from score_map;

-- 获取所有的key：
select name,map_keys(score) from score_map;

-- 按照key来进行获取value值
select name,score["sx"]  from score_map;

-- 查看map元素个数
select name,size(score) from score_map;

-- 构建一个map
select map(1, ‘zs‘, 2, ‘lisi‘);

技术分享图片

2、Struct类型构建: struct

语法: struct(val1, val2, val3, …)
说明：根据输入的参数构建结构体struct类型，似于C语言中的结构体，内部数据通过X.X来获取，假设我
数据格式是这样的，电影ABC，有1254人评价过，打分为7.4分

-- 创建struct表
hive> create table movie_score(name string, info struct<number:int,score:float>)
row format delimited fields terminated by "\t"  
collection items terminated by ":"; 

-- 加载数据
cd /bigdata/logs/hivedatas/
vi struct.txt

-- 电影ABC，有1254人评价过，打分为7.4分
ABC	1254:7.4  
DEF	256:4.9  
XYZ	456:5.4

-- 加载数据
load data local inpath ‘/bigdata/logs/hivedatas/struct.txt‘ overwrite into table movie_score;

-- hive当中查询数据
hive> select * from movie_score;  
hive> select name, info.number, info.score from movie_score;  
OK  
1254    7.4  
256     4.9  
456     5.4  

-- 构建一个struct
select struct(1, ‘anzhulababy‘, ‘moon‘, 1.68);

技术分享图片

3、Array类型构建: array

语法: array(val1, val2, …)
说明：根据输入的参数构建数组array类型

hive> create table person(name string, work_locations array<string>)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘
COLLECTION ITEMS TERMINATED BY ‘,‘;

-- 加载数据到person表当中去
cd /bigdata/logs/hivedatas/
vim person.txt

-- 数据内容格式如下
biansutao	beijing,shanghai,tianjin,hangzhou
linan	changchun,chengdu,wuhan

-- 加载数据
hive > load  data local inpath ‘/bigdata/logs/hivedatas/person.txt‘ overwrite into table person;

-- 查询所有数据数据
hive > select * from person;

-- 按照下标索引进行查询
hive > select work_locations[0] from person;

-- 查询所有集合数据
hive  > select work_locations from person; 

-- 查询元素个数
hive >  select size(work_locations) from person;   

-- 构建array
select array(1, 2, 1);
select array(1, ‘a‘, 1.0);
select array(1, 2, 1.0);

8 复杂类型长度统计函数

1、Map类型长度函数: size(Map<k .V>)

语法: size(Map<k .V>)
返回值: int
说明: 返回map类型的长度

hive> select size(map(1, ‘zs‘, 2, ‘anzhulababy‘));
2

2、array类型长度函数: size(Array)

语法: size(Array)
返回值: int
说明: 返回array类型的长度

hive> select size(t) from arr_table2;
4

3、类型转换函数

类型转换函数: cast
语法: cast(expr as )
返回值: Expected "=" to follow "type"
说明: 返回转换后的数据类型

hive> select cast(‘1‘ as bigint) ;   
1

9 行转列

1、相关函数说明

CONCAT(string A/col, string B/col…)：返回输入字符串连接后的结果，支持任意个输入字符串;
CONCAT_WS(separator, str1, str2,...)：它是一个特殊形式的 CONCAT()。
- 第一个参数剩余参数间的分隔符。分隔符可以是与剩余参数一样的字符串。如果分隔符是 NULL，返回值也将为 NULL。
- 这个函数会跳过分隔符参数后的任何 NULL 和空字符串。分隔符将被加到被连接的字符串之间;
COLLECT_SET(col)：函数只接受基本数据类型，它的主要作用是将某字段的值进行去重汇总，产生array类型字段。

2、数据准备

数据准备

name	constellation	blood_type
孙悟空	白羊座	A
老王	射手座	A
宋宋	白羊座	B
猪八戒	白羊座	A
按住啦baby	射手座	A

3、需求

把星座和血型一样的人归类到一起。结果如下：

射手座,A            老王|按住啦baby
白羊座,A            孙悟空|猪八戒
白羊座,B            宋宋

4、创建表数据文件

node03服务器执行以下命令创建文件，注意数据使用\t进行分割

cd /bigdata/logs/hivedatas
vim constellation.txt

孙悟空	白羊座	A
老王	射手座	A
宋宋	白羊座	B       
猪八戒	白羊座	A
凤姐	射手座	A

5、创建hive表并导入数据

创建hive表并加载数据

hive (hive_explode)> create table person_info(name string, constellation string,  blood_type string) row format delimited fields terminated by "\t";

加载数据

hive (hive_explode)> load data local inpath ‘/bigdata/logs/hivedatas/constellation.txt‘ into table person_info;

6、按需求查询数据

hive (hive_explode)> select t1.base, concat_ws(‘|‘, collect_set(t1.name)) name 
from    
(select name, concat(constellation, "," , blood_type) base from person_info) t1 
group by t1.base;

10 列转行

1、函数说明

EXPLODE(col)：将hive一列中复杂的array或者map结构拆分成多行。
LATERAL VIEW
- 用法：LATERAL VIEW udtf(expression) tableAlias AS columnAlias
- 解释：用于和split, explode等UDTF一起使用，它能够将一列数据拆成多行数据，在此基础上可以对拆分后的数据进行聚合。

2、数据准备

数据内容如下，字段之间都是使用\t进行分割

cd /bigdata/logs/hivedatas

vim movie.txt

《疑犯追踪》	悬疑,动作,科幻,剧情
《Lie to me》	悬疑,警匪,动作,心理,剧情
《战狼2》	战争,动作,灾难

3、需求

将电影分类中的数组数据展开。结果如下：

《疑犯追踪》	悬疑
《疑犯追踪》	动作
《疑犯追踪》	科幻
《疑犯追踪》	剧情
《Lie to me》	悬疑
《Lie to me》	警匪
《Lie to me》	动作
《Lie to me》	心理
《Lie to me》	剧情
《战狼2》	战争
《战狼2》	动作
《战狼2》	灾难

4、创建hive表并导入数据

创建hive表

hive (hive_explode)> create table movie_info(movie string, category array<string>) 
row format delimited fields terminated by "\t" 
collection items terminated by ",";

加载数据

load data local inpath "/bigdata/logs/hivedatas/movie.txt" into table movie_info;

5、按需求查询数据

hive (hive_explode)> select movie, category_name from movie_info 
lateral view explode(category) table_tmp as category_name;

Hive的常用函数

原文：https://www.cnblogs.com/tenic/p/14762526.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)