【教程】Presto优化

时间：2021-06-03 23:52:24 阅读：26 评论：0 收藏：0 [点我收藏+]

1.只选择使用必要的字段

由于采用列式存储，选择需要的字段可加快字段的读取、减少数据量。避免采用*读取所有字段
-- 正确的写法
SELECT id,name FROM users
-- 错误的写法
SELECT * FROM users

2.过滤条件必须加上分区字段

-- 正确的写法 
select  
    count(1) as "记录数" 
from src.log_event_flow_whole 
where thedate = ‘2021-01-01‘ 
 
-- 错误的写法 
select  
    count(1) as "记录数" 
from src.log_event_flow_whole 
where thedate = ‘2021-01-01‘

3.Group By语句优化

合理安排Group by语句中字段顺序对性能有一定提升。将Group By语句中字段按照每个字段distinct数据多少进行降序排列 
-- 正确的写法 
select  
    property_id as "类目" 
    ,city as "城市" 
    ,count(1) as "商家数" 
from biz.hlj_db__wedding__merchants 
group by property_id,city 

-- 错误的写法 
select  
    city as "城市" 
    ,property_id as "类目" 
    ,count(1) as "商家数" 
from biz.hlj_db__wedding__merchants 
group by city,property_id

4.Order by时使用Limit

order by需要扫描数据到单个worker节点进行排序，导致单个worker需要大量内存。如果是查询Top N或者Bottom N，使用limit可减少排序计算和内存压力
-- 正确的写法 
SELECT id,name,city,property_id 
FROM biz.hlj_db__wedding__merchants  
ORDER BY city 
LIMIT 100 
 
-- 错误的写法 
SELECT id,name,city,property_id 
FROM biz.hlj_db__wedding__merchants  
ORDER BY city

5.使用近似聚合函数

Presto有一些近似聚合函数，对于允许有少量误差的查询场景，使用这些函数对查询性能有大幅提升。比如使用approx_distinct() 函数比Count(distinct x)有大概2.3%的误差 
 
SELECT approx_distinct(city)  
FROM biz.hlj_db__wedding__merchants

6.用regexp_like代替多个like语句

Presto查询优化器没有对多个like语句进行优化，使用regexp_like对性能有较大提升 
-- 正确的写法 
SELECT 
id,name,kind 
FROM biz.hlj_db__wedding__cities 
WHERE regexp_like(kind, ‘A|B|C‘) 
 
-- 错误的写法 
SELECT 
id,name,kind 
FROM biz.hlj_db__wedding__cities 
WHERE kind like ‘%%A%%‘ 
OR kind like ‘%%B%%‘ 
OR kind like ‘%%C%%‘

7.使用Join语句时将大表放在左边

Presto中join的默认算法是broadcast join，即将join左边的表分割到多个worker，然后将join右边的表数据整个复制一份发送到每个worker进行计算。如果右边的表数据量太大，则可能会报内存溢出错误

8.使用Rank函数代替row_number

使用Rank函数代替row_number函数来获取TopN,在进行一些分组排序场景时，使用rank函数性能更好

【教程】Presto优化

原文：https://www.cnblogs.com/blog-for-me/p/14846362.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)