我想用指定的字符串替换NULL
个值.然而,我只想对来自first个非NULL
的NULL
个值进行这个替换.这意味着,如果NULL
的值是第一个非NULL
的before,则保留NULL
.
例如,考虑以下数据:
# | user_id | some_date | animal |
# |---------|------------|---------|
# | 1 | 2022-01-01 | NULL | <~~ keep as NULL
# | 1 | 2022-01-02 | zebra | <~~ 'zebra' is the first non-NULL value for user_id = 1
# | 1 | 2022-01-03 | lion |
# | 1 | 2022-01-04 | NULL | <~~ replace NULL with 'no_animal'
# | 1 | 2022-01-05 | cat |
# | 2 | 2023-10-05 | NULL | <~~ keep as NULL
# | 2 | 2023-10-06 | NULL | <~~ keep as NULL
# | 2 | 2023-10-07 | dog | <~~ 'dog' is the first non-NULL value for user_id = 2
# | 2 | 2023-10-08 | frog |
# | 2 | 2023-10-09 | NULL | <~~ replace NULL with 'no_animal'
# | 3 | 2024-02-03 | hamster | <~~ 'hamster' is the first non-NULL value for user_id = 3
# | 3 | 2024-02-04 | rabbit |
# | 3 | 2024-02-05 | NULL | <~~ replace NULL with 'no_animal'
# | 3 | 2024-02-06 | NULL | <~~ replace NULL with 'no_animal'
所需输出应为:
# | user_id | some_date | animal | replaced_null |
# |---------|------------|---------|---------------|
# | 1 | 2022-01-01 | NULL | NULL |
# | 1 | 2022-01-02 | zebra | zebra |
# | 1 | 2022-01-03 | lion | lion |
# | 1 | 2022-01-04 | NULL | no_animal |
# | 1 | 2022-01-05 | cat | cat |
# | 2 | 2023-10-05 | NULL | NULL |
# | 2 | 2023-10-06 | NULL | NULL |
# | 2 | 2023-10-07 | dog | dog |
# | 2 | 2023-10-08 | frog | frog |
# | 2 | 2023-10-09 | NULL | no_animal |
# | 3 | 2024-02-03 | hamster | hamster |
# | 3 | 2024-02-04 | rabbit | rabbit |
# | 3 | 2024-02-05 | NULL | no_animal |
# | 3 | 2024-02-06 | NULL | no_animal |
sql方言
我使用运行在Trino SQL上的AWS Athena.
可重现数据
WITH my_tbl AS (
SELECT *
FROM (VALUES
(1, DATE '2022-01-01', NULL),
(1, DATE '2022-01-02', 'zebra'),
(1, DATE '2022-01-03', 'lion'),
(1, DATE '2022-01-04', NULL),
(1, DATE '2022-01-05', 'cat'),
(2, DATE '2023-10-05', NULL),
(2, DATE '2023-10-06', NULL),
(2, DATE '2023-10-07', 'dog'),
(2, DATE '2023-10-08', 'frog'),
(2, DATE '2023-10-09', NULL),
(3, DATE '2024-02-03', 'hamster'),
(3, DATE '2024-02-04', 'rabbit'),
(3, DATE '2024-02-05', NULL),
(3, DATE '2024-02-06', NULL)
) AS t(user_id, some_date, animal)
)