作为一个最小的例子,假设我们有next polars. DataFrame:
df = pl.DataFrame({"sub_id": [1,2,3], "engagement": ["one:one,two:two", "one:two,two:one", "one:one"], "total_duration": [123, 456, 789]})
sub_id | engagement | total_duration |
---|---|---|
1 | one:one,two:two | 123 |
2 | one:two,two:one | 456 |
3 | one:one | 789 |
然后我们就炸开"订婚"栏
df = df.with_columns(pl.col("engagement").str.split(",")).explode("engagement")
并收到:
sub_id | engagement | total_duration |
---|---|---|
1 | one:one | 123 |
1 | two:two | 123 |
2 | one:two | 456 |
2 | two:one | 456 |
3 | one:one | 789 |
对于可视化,我使用Plotly,代码如下:
import plotly.express as px
fig = px.bar(df, x="sub_id", y="total_duration", color="engagement")
fig.show()
由此产生的情节:
现在,这基本上意味着用户1和2的total_duration(总观看时间)增加了一倍. 我如何保持total_duration per sub,但保留参与组如情节图例所示?