下面是一张不同年份不同口味冰淇淋的桌子(名称:ice_cream_table):

Year      Flavor
2008         Mint
2008         Mint
2008 Cookie Dough
2008 Cookie Dough
2008    Pistachio
2013    Chocolate
2013 Cookie Dough
2013    Pistachio
2013    Chocolate
2013    Pistachio
2017    Chocolate
2017      Vanilla
2017    Chocolate
2017 Cookie Dough
2017   Strawberry
2019        Mango
2019       Lemon
2019      Vanilla
2019        Mango
2019        Mango
2022    Chocolate
2022    Chocolate
2022         Mint
2022   Strawberry
2022       Cherry

CREATE TABLE ice_cream_table  (
  "Year" INT,
  "Flavor" VARCHAR(100)
);

INSERT INTO ice_cream_table 
  ("Year", "Flavor")
VALUES
  ('2008', 'Mint'),
  ('2008', 'Mint'),
  ('2008', 'Cookie Dough'),
  ('2008', 'Cookie Dough'),
  ('2008', 'Pistachio'),
  ('2013', 'Chocolate'),
  ('2013', 'Cookie Dough'),
  ('2013', 'Pistachio'),
  ('2013', 'Chocolate'),
  ('2013', 'Pistachio'),
  ('2017', 'Chocolate'),
  ('2017', 'Vanilla'),
  ('2017', 'Chocolate'),
  ('2017', 'Cookie Dough'),
  ('2017', 'Strawberry'),
  ('2019', 'Mango'),
  ('2019', 'Lemon'),
  ('2019', 'Vanilla'),
  ('2019', 'Mango'),
  ('2019', 'Mango'),
  ('2022', 'Chocolate'),
  ('2022', 'Chocolate'),
  ('2022', 'Mint'),
  ('2022', 'Strawberry'),
  ('2022', 'Cherry');

在每一年,我都想弄清楚:有多少口味是第一次出现的,有多少口味是往年出现的,有多少种不同的口味(第一次+重复)?

以下是我的代码:

with yearly_flavor as (
    select 
        year,
        flavor,
        row_number() over (partition by flavor order by year) as rn 
    from 
        ice_cream_table 
    group by 
        flavor, year
),

new_flavor as (
    select
        year, 
        count(flavor) as new_flavor 
    from 
        yearly_flavor
    where 
        rn = 1
    group by 
        year
),

repeated_flavor as (
    select
        year, 
        case 
            when count(flavor) is null then 0 
            else count(flavor) 
        end as repeated_flavor,
        count(flavor) as new_flavor 
    from 
        yearly_flavor 
    where 
        rn > 1
    group by 
        year
),

total_flavor as (
    select
        year, 
        count(distinct flavor) as total_flavor 
    from 
        ice_cream_table
    group by 
        year
)

select 
    n.year,
    n.new_flavor,
    r.repeated_flavor,
    t.total_flavor
from 
    new_flavor n
left join 
    repeated_flavor r on n.year = r.year
join 
    total_flavor t on n.year = t.year
order by
    n.year;

我的输出:

+------+-----------+----------------+--------------+
| Year | new_flavor | repeated_flavor| total_flavor |
+------+-----------+----------------+--------------+
| 2008 | 3         | NULL           | 3            |
| 2013 | 1         | 2              | 3            |
| 2017 | 2         | 2              | 4            |
| 2019 | 2         | 1              | 3            |
| 2022 | 1         | 3              | 4            |
+------+-----------+----------------+--------------+

我做得对吗?我使用Python的经验更丰富,而使用SQL的经验更少.代码运行,但我不确定我的逻辑是否捕捉到了本质.

推荐答案

您可以使用ROW_NUMBERLAG来查找每flavor的第一行,然后按year分组并向上计数.

与其他答案相比,这具有优势,因为它只需要扫描一次基表.

SELECT
  Year,
  COUNT(IsFirst) AS NewFlavors,
  COUNT(DISTINCT Flavor) - COUNT(IsFirst) AS RepeatedFlavors,
  COUNT(DISTINCT Flavor) AS TotalFlavors
FROM (
    SELECT *,
      CASE WHEN ROW_NUMBER() OVER (PARTITION BY Flavor ORDER BY Year) = 1 THEN 1 END AS IsFirst
    FROM ice_cream_table icf
) icf
GROUP BY
  Year;

db<>fiddle

Sql相关问答推荐

如何在幂函数中正确使用Power()和Exp()

如何优化我的功能以减少花费的时间?

如何在不更改S代码的情况下,判断存储过程调用了多少次clr函数?

替换上一个或下一个值中的空值并添加其价格日期

SQL Select 最小优先级

了解多个分组集

在Power Bi中将SQL代码转换为DAX

STContains、STIntersections和STWithin返回错误的地理结果

删除行而不使数据库超载

将 json 列键映射到第二个表中的匹配列值

Select 一个非零值减少重复

试图找到两个身份列表的交集(列表的长度不同),但获取列 id 不明确?

SQL SUM Filter逻辑解释

验证某个日期前后的连续代码

PostgreSQL如何将Unix纪元时间戳转换为日期时间并进行拼接

所有列分组的简写?

PostgreSQL:通过数组的元素从另一个表中 Select 数据,然后按顺序显示

如果当前日期是一周中的某一天,则从另一天提取结果

从多个连接返回 1 行到同一个表 - SQL Server

Snowflake SQL group-by 的行为不同,具体取决于列是按位置引用还是按别名引用