t

thatdaniel

V1

2022/11/27阅读:25主题:默认主题

HIVE-窗口字句

Hive - 窗口函数

Hive的窗口函数用法基本和SQL一致,不过为了加强印象,还是在再归纳一下hive的窗口函数

1. 窗口函数语法结构

Group by 普通聚合函数每组只有一条记录,简单来说返回的行数以group by后面的分类为准,一般搭配的聚合函数sum,avg,max,min等;

而窗口函数则可以为窗口中的每行都返回一个值。 普通聚合函数聚合的行集是组,开窗函数聚合的行集是窗口。

窗口函数语法结构:

聚合函数+ over (partition by [] order by [] asc/desc rows between []),用rows between 来限制窗口函数的计算范围

rows between 的相关窗口字句如下:

窗口子句 备注
PRECEDING 往前 n preceding 从当前行向前n行
FOLLOWING 往后 n following 从当前行向后n行
CURRENT ROW 当前行
UNBOUNDED 起点
UNBOUNDED PRECEDING 表示该窗口最前面的行(起点)
UNBOUNDED FOLLOWING 表示该窗口最后面的行(终点)

rows between 10 preceding and current row:当前行的前十行;

rows between current row and 10 following:当前行的后10行;

row between unbounded preceding and current row:当前行到第一行(最前行)。一般求累计求和的时候使用

rows between 必须和order by 使用

2. 示例

现有表格结构如下:

create table emp(
        userid bigint,
        emp_name array<string>, -- 数组类型:外国人的名字包含 名 和 姓
        emp_date map<string,date>, -- 将入职日期和出生日期组合,入职日期和出生日期字段类型(key_value)为string,
        -- 其值本身(value_type)类型为date
        other_info struct<deparment:string,gender:string>
 row format delimited -- 指定分隔符
    fields terminated by '\t' -- 字段分隔为\t,表示一个字段为一个制表位
    collection terminated by ',' -- 元素分隔符:将array/struct中各个字段用,隔开;将map中的key_value用,隔开
    map keys terminated by ':'   -- 将map类型的key和value用:隔开
     
    -- 三类分隔符指令都存在时,顺序不能错

需求:部门人数从多到少排序,计算累计人数及累计占比

  • 部门人数从多到少排序
select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc
  • 计算累计人数
select *, sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) as add_count
from 
(select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc)a 
  • 计算占比和累计占比
select *, sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) as add_count,
emp_count/sum(emp_count) over() as sum_p ,-- 人数占比
sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) /sum(emp_count) over() as add_p -- 累计占比
from 
(select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc)a 
  • 格式调整,显示为百分比,保留小数两位
select *, sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) as add_count,
concat(round((emp_count/sum(emp_count) over() as sum_p)*100,2),'%'),
concat(round((sum(emp_count) over(order by emp_count desc rows between unbounded preceding and current row) /sum(emp_count) over())*100,2),'%') as add_p
from 
(select other_info.department,count(emp) as emp_count from emp
group by other_info.department
order by other_info.department desc)a 
order by emp_count desc -- 人数降序

分类:

后端

标签:

后端

作者介绍

t
thatdaniel
V1