codeye
2022/10/17阅读:22主题:默认主题
3个最新的谷歌SQL面试问题
3个最新的谷歌SQL面试问题
我们在Big Tech Interviews (BTI)分析了从上市前到财富500强公司的50,000多次面试,策划了一个最新的SQL面试问题和解决方案的独家列表,这样你就可以在下一次面试中取得胜利 你即将读到的是谷歌公司针对数据科学、数据分析师、商业智能工程师、数据工程师和商业分析师职位提出的3个最新的谷歌SQL面试问题和解决方案。
想通过按需支持和动态SQL操场来练习更多类似的SQL面试问题吗?
请在这里了解更多!
谷歌SQL面试题78:MX和美国第三高的山脉 提示。写一个查询,找出每个国家的第三高的山名。请确保按ASC顺序排列国家。
表:山脉
+---------------------+------+-------------+
|名称|高度|国家 |
+---------------------+------+-------------+
|德纳利 |20310 |美国|
|圣埃利亚斯 |18008 |美国|
|福雷克 |17402 |美国|
|Pico de Orizab |18491 |墨西哥 |
|Popocatépetl |17820 |Mexico |
|Iztaccihuatl |17160 |Mexico |
+---------------------+------+-------------+
解决办法
SELECT "country", "name"
FROM (
SELECT "country", "name", RANK() OVER (PARTITION BY "country" ORDER BY "height" DESC) as "rank"
FROM mountains
) as m
WHERE "rank" = 3
ORDER BY country ASC
谷歌SQL面试题20:用最新的_事件查找当前的页面数量
提示。给出下表,其中有关于一个页面的状态何时被改变的信息。写一个查询,找出当前有最新事件的页面的数量。
提示:page_flag
列将被用来识别一个页面是 "OFF "还是 "ON"。
表:pages_info
+-------+--------------------------------------+----------+
|page_id|event_time |page_flag |
+-------+--------------------------------------+----------+
|1 |current_timestamp - interval '6 hours'|ON|
|1 |current_timestamp - interval '3 hours'|OFF |1
|1 |current_timestamp - interval '1 hours'|ON |1 |current_timestamp - interval '1 hours'|OFF
|2 |current_timestamp - interval '3 hours'|ON |2 |current_timestamp - interval '3 hours'|ON
|2 |current_timestamp - interval '1 hours'|OFF |3 |current_timestamp - interval '3 hours'|ON
|3 |current_timestamp |ON |
+-------+--------------------------------------+----------+
解决办法
首先,对于每个页面ID,让我们选择最新的记录(基于事件时间列)。
select
page_id,
max(event_time) as latest_event
from pages_info
group by page_id
现在,我们需要将前面的查询与原表连接起来,并检查其中有多少人的标记页面等于ON。这很简单。
with latest_event as (
select
page_id,
max(event_time) as latest_event
from pages_info
group by page_id
)
select
sum(
case
when page_flag = 'ON' then 1
else 0
end
) as result
from pages_info pi
join latest_event le on pi.page_id = le.page_id and pi.event_time = le.latest_event;
谷歌SQL面试问题19:返回用户
提示。给出一个表,上面有用户访问一个网页的信息。写一个查询来返回连续访问该网页时间最长的3个用户。将这3个用户从最长到最短的时间内排序。
表:访问量
+--------+----------------------------+
|user_id |date |
+--------+----------------------------+
|1 |current_timestamp::DATE - 0 |
|1 |current_timestamp::DATE - 1 |1 |current_timestamp::DATE - 1
|1 |current_timestamp::DATE - 2 |1 |current_timestamp::DATE - 2
|1 |current_timestamp::DATE - 3 |1 |current_timestamp::DATE - 2
|1 |current_timestamp::DATE - 4 |2 |current_timestamp::DATE - 3
|2 |current_timestamp::DATE - 1 |4 |current_timestamp::DATE - 4
|4 |current_timestamp::DATE - 0 |4 |current_timestamp::DATE - 0
|4 |current_timestamp::DATE - 1 |4 |current_timestamp::DATE - 1
|4 |current_timestamp::DATE - 3 |4 |current_timestamp::DATE - 3
|4 |current_timestamp::DATE - 4 |current_timestamp::DATE - 3
|4|current_timestamp::DATE - 62|current_timestamp::DATE - 3
+--------+----------------------------+
解决方案
首先,让我们添加一个新的列,其值是每个用户的下一次访问(与当前日期不同)。我们将使用lead函数来做到这一点。
select distinct
user_id,
date,
lead(date) over (partition by user_id order by date) as next_date
from (select distinct * from visits) as t;
一旦我们有了这个,让我们创建另一个列,其目的是让我们知道连胜的停止。这基本上包括检查下一个日期是否与当前日期+1不同。像这样
with next_dates as (
select distinct
user_id,
date,
lead(date) over (partition by user_id order by date) as next_date
from (select distinct * from visits) as t -- remove duplicates
)
select
user_id,
date,
next_date,
case
when next_date is null or next_date = date + 1 then 1
else null
end as streak
from next_dates;
一旦我们有了这个,我们将为每个用户创建一个分区,每个分区代表一个连续的条纹。从概念上讲,我们要做的是,对于每个用户,取最近的记录(基于日期)并赋值为0,然后寻找下面的记录,如果连胜没有停止就赋值为0,如果连胜停止就赋值为1(如果连胜列为空),然后继续这样做,直到每个连胜由一个不同的分区代表。完成这一逻辑的代码如下。
with next_dates as (
select distinct
user_id,
date,
lead(date) over (partition by user_id order by date) as next_date
from (select distinct * from visits)
),
streaks as (
select
user_id,
date,
next_date,
case
when next_date is null or next_date = date + 1 then 1
else null
end as streak
from next_dates
)
select
*,
sum(
case
when streak is null then 1
else 0
end
) over (partition by user_id order by date) as partition
from streaks;
一旦我们有了这个分区,问题就简单了,现在我们只需要计算每个用户和分区的记录数,并找到计数最多的用户。完整的查询将看起来像这样。
with next_dates as (
select distinct
user_id,
date,
lead(date) over (partition by user_id order by date) as next_date
from visits
),
streaks as (
select
user_id,
date,
next_date,
case
when next_date is null or next_date = date + 1 then 1
else null
end as streak
from next_dates
),
partitions as (
select
*,
sum(
case
when streak is null then 1
else 0
end
) over (partition by user_id order by date) as partition
from streaks
),
count_partitions as (
select user_id, partition, count(1) as streak_days
from partitions
group by user_id, partition
)
select
user_id,
max(streak_days) as longest_streak
from count_partitions
group by user_id
order by 2 desc
limit 3;
想通过按需支持和动态SQL操场来练习更多类似的谷歌问题吗?在这里了解更多!
我们在Big Tech Interviews (BTI)分析了超过50,000次面试,这些面试来自于最新的数据科学、数据分析师、数据工程师、商业分析师和商业智能工程师等职位,这些职位由IPO前到财富500强的公司担任,以策划一个最新的SQL面试问题和解决方案的独家列表,这样你就可以在下一次面试中取得好成绩!。
作者介绍