这篇文章继续介绍4个窗口函数。
lag
lag(column_name,n,default):用于统计窗口内往上第n行的值,第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上n行为null时,取默认值,若不指定,则为null)
lead
lead与lag想法,lead(column_name,n,default)用于统计窗口内向下取n行的值
first_value
first_value() 取分组排序后,截止到当前行的第一个值
last_value
last_value() 取分组排序后,截止到当前行的最后一个值
下面通过具体的实例介绍它们的用法
create table if not exists buy_info ( name string, buy_date string, buy_num int ) row format delimited fields terminated by ‘|‘; select * from buy_info;
| name | buy_date | buy_num | 
| zhangsan | 2020-02-23 | 21 | 
| zhangsan | 2020-03-12 | 34 | 
| zhangsan | 2020-04-15 | 12 | 
| zhangsan | 2020-05-12 | 51 | 
| lisi | 2020-03-16 | 12 | 
| lisi | 2020-03-21 | 24 | 
| lisi | 2020-07-12 | 41 | 
| lisi | 2020-07-27 | 32 | 
select   name , buy_date,buy_num,
lag(buy_date,1,‘1970-01-01‘)  over(partition  by  name  order  by  buy_date)   as   last_date,
lead(buy_date,1,‘2020-12-31‘)   over(partition  by  name  order  by buy_date)  as  next_date,
first_value ()  over(partition  by  name  order  by   buy_date)  as   first_date,
last_value()  over(partition  by   name   order by  buy_date)  as last_date    from   buy_info; 
 
查询结果如下
| name | buy_date | buy_num | last_date | next_date | first_date | last_date | 
| zhangsan | 2020-02-23 | 21 | 1970-01-01 | 2020-03-12 | 2020-02-23 | 2020-05-12 | 
| zhangsan | 2020-03-12 | 34 | 2020-02-23 | 2020-04-15 | 2020-02-23 | 2020-05-12 | 
| zhangsan | 2020-04-15 | 12 | 2020-03-12 | 2020-05-12 | 2020-02-23 | 2020-05-12 | 
| zhangsan | 2020-05-12 | 51 | 2020-04-15 | 2020-12-31 | 2020-02-23 | 2020-05-12 | 
| lisi | 2020-03-16 | 12 | 1970-01-01 | 2020-03-21 | 2020-03-16 | 2020-07-27 | 
| lisi | 2020-03-21 | 24 | 2020-03-16 | 2020-07-12 | 2020-03-16 | 2020-07-27 | 
| lisi | 2020-07-12 | 41 | 2020-03-21 | 2020-07-27 | 2020-03-16 | 2020-07-27 | 
| lisi | 2020-07-27 | 32 | 2020-07-12 | 2020-12-31 | 2020-03-16 | 2020-07-27 | 
Hive 窗口函数详细介绍3 lag,lead,first_value,last_value
原文:https://www.cnblogs.com/itachilearner/p/13392469.html