在对数据进行汇总和分析的时候,经常需要用到排名相关的操作,下面是hive中经常用到的3个排名函数:
rank()
dense_rank()
row_number()
函数说明:
rank():在一组数据内按顺序显示排名顺序,值相同的情况下,排序数会重复,下个排序数会根据记录数接着排名。
dense_rank():在一组数据内按顺序显示排名顺序,值相同的情况下,排序数会重复,下个排序数会根据排名数接着排名。
row_number():在一组数据内按顺序显示排名顺序,值相同的情况下,排序数不会重复。
实际操作:
booksales表中有分类(catalog)字段,价格(price)字段,现在想查询一下每个分类(catalog)下价格(price)有高到低的排名情况
排名前数据查询:
hive> select * from booksales;
OK
bookid  catalog price   amount  dt
B00000001       novel   54.0    5       20200718
B00000002       novel   33.0    9       20200718
B00000003       novel   54.0    11      20200718
B00000004       novel   59.0    10      20200718
B00000005       novel   53.0    4       20200718
B00000006       literature      24.0    10      20200718
B00000007       literature      35.0    8       20200718
B00000008       literature      35.0    4       20200718
B00000009       literature      35.0    7       20200718
B00000010       literature      35.0    9       20200718
B00000011       management      84.0    6       20200718
B00000012       management      73.0    7       20200718
B00000013       management      84.0    5       20200718
B00000014       management      52.0    11      20200718
B00000015       management      87.0    11      20200718
B00000016       management      45.0    7       20200718
B00000017       social  39.0    1       20200718
B00000018       social  16.0    9       20200718
B00000019       social  85.0    9       20200718
B00000020       social  92.0    4       20200718
排名后数据查询:
hive> select
> *
> ,rank() over (partition by catalog order by price desc) price_rank
> ,dense_rank() over (partition by catalog order by price desc) price_dense_rank
> ,row_number() over (partition by catalog order by price desc) price_row_number
> from booksales;
OK
bookid  catalog price   amount  dt      price_rank      price_dense_rank        price_row_number
B00000008       literature      35.0    4       20200718        1       1       1
B00000007       literature      35.0    8       20200718        1       1       2
B00000010       literature      35.0    9       20200718        1       1       3
B00000009       literature      35.0    7       20200718        1       1       4
B00000006       literature      24.0    10      20200718        5       2       5
B00000015       management      87.0    11      20200718        1       1       1
B00000013       management      84.0    5       20200718        2       2       2
B00000011       management      84.0    6       20200718        2       2       3
B00000012       management      73.0    7       20200718        4       3       4
B00000014       management      52.0    11      20200718        5       4       5
B00000016       management      45.0    7       20200718        6       5       6
B00000004       novel   59.0    10      20200718        1       1       1
B00000001       novel   54.0    5       20200718        2       2       2
B00000003       novel   54.0    11      20200718        2       2       3
B00000005       novel   53.0    4       20200718        4       3       4
B00000002       novel   33.0    9       20200718        5       4       5
B00000020       social  92.0    4       20200718        1       1       1
B00000019       social  85.0    9       20200718        2       2       2
B00000017       social  39.0    1       20200718        3       3       3
B00000018       social  16.0    9       20200718        4       4       4
[HIVE] rank() dense_rank() row_number()的学习
原文:https://www.cnblogs.com/larry66/p/13337935.html