给定的EventLog表有userid
列和ds
列.这代表给定用户在"网站"上发生的所有事件.
样本数据:
userid | ds |
---|---|
user1 | 2022-01-01 |
user2 | 2022-02-11 |
user2 | 2022-03-21 |
user3 | 2022-01-11 |
user3 | 2022-02-27 |
user3 | 2022-04-06 |
我需要从表中最早的月份开始计算每月保留曲线.如果用户在给定月份内没有执行操作(没有eventLog
年的记录),则被视为离职.
这就是我try 过的:
SELECT DATE_FORMAT(ds, '%Y-%m-01') as ds_month
,COUNT(DISTINCT userid) * 1.0 / COUNT(*) as retention_rate
FROM eventLog
GROUP BY DATE_FORMAT(ds, '%Y-%m-01')
这是小提琴:http://sqlfiddle.com/#!9/f6bdefc/4
我得到了以下输出:
预期结果为:
month | retention_rate | Reasoning |
---|---|---|
2022-01-01 | 100% | This is 100% by definition - user1 and user3 did the first action during this month |
2022-02-01 | 66% | 2 / 3 users retained (user2 did the first action, user3 retained, user1 churned) |
2022-03-01 | 33% | 1 / 3 users retained (user2 retained, user1 and user3 churned) |
2022-04-01 | 33% | 1 / 3 users retained (user3 retained, user1 and user2 churned) |