t
thatdaniel
V1
2022/11/27阅读:25主题:默认主题
HIVE-窗口字句
Hive - 窗口函数
Hive的窗口函数用法基本和SQL一致,不过为了加强印象,还是在再归纳一下hive的窗口函数
1. 窗口函数语法结构
Group by 普通聚合函数每组只有一条记录,简单来说返回的行数以group by后面的分类为准,一般搭配的聚合函数sum,avg,max,min等;
而窗口函数则可以为窗口中的每行都返回一个值。 普通聚合函数聚合的行集是组,开窗函数聚合的行集是窗口。
窗口函数语法结构:
聚合函数+ over (partition by [] order by [] asc/desc rows between []),用rows between 来限制窗口函数的计算范围
rows between 的相关窗口字句如下:
窗口子句 | 备注 |
---|---|
PRECEDING | 往前 n preceding 从当前行向前n行 |
FOLLOWING | 往后 n following 从当前行向后n行 |
CURRENT ROW | 当前行 |
UNBOUNDED | 起点 |
UNBOUNDED PRECEDING | 表示该窗口最前面的行(起点) |
UNBOUNDED FOLLOWING | 表示该窗口最后面的行(终点) |
rows between 10 preceding and current row:当前行的前十行;
rows between current row and 10 following:当前行的后10行;
row between unbounded preceding and current row:当前行到第一行(最前行)。一般求累计求和的时候使用
rows between 必须和order by 使用
2. 示例
现有表格结构如下:
create table emp(
userid bigint,
emp_name array<string>, -- 数组类型:外国人的名字包含 名 和 姓
emp_date map<string,date>, -- 将入职日期和出生日期组合,入职日期和出生日期字段类型(key_value)为string,
-- 其值本身(value_type)类型为date
other_info struct<deparment:string,gender:string>
row format delimited -- 指定分隔符
fields terminated by '\t' -- 字段分隔为\t,表示一个字段为一个制表位
collection terminated by ',' -- 元素分隔符:将array/struct中各个字段用,隔开;将map中的key_value用,隔开
map keys terminated by ':' -- 将map类型的key和value用:隔开
-- 三类分隔符指令都存在时,顺序不能错
需求:部门人数从多到少排序,计算累计人数及累计占比
-
部门人数从多到少排序
select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc
-
计算累计人数
select *, sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) as add_count
from
(select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc)a
-
计算占比和累计占比
select *, sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) as add_count,
emp_count/sum(emp_count) over() as sum_p ,-- 人数占比
sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) /sum(emp_count) over() as add_p -- 累计占比
from
(select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc)a
-
格式调整,显示为百分比,保留小数两位
select *, sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) as add_count,
concat(round((emp_count/sum(emp_count) over() as sum_p)*100,2),'%'),
concat(round((sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) /sum(emp_count) over())*100,2),'%') as add_p
from
(select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc)a
order by emp_count desc -- 人数降序
作者介绍
t
thatdaniel
V1