Python 使用Polars在一个列上基于分组方法创建新列的方式

发布于06月28日

I have some data structed as showed at the first picture. Where I like to restructure the dataframe. Short piece of the initial data:

id	time	value
2050	02-01	20
2051	02-01	25
2050	02-02	21
2051	02-02	22
2051	02-03	23

我希望重构的DataFrame有一个时间戳列，然后每个外部gid都有一个列.我曾用Pandas 做过，但由于文件相当大，而且必须多次使用，由于速度快，我想用极地来做.

预期输出:

time	2050	2051
02-01	20	25
02-02	21	22
02-03	nan	23

我try 过使用GROUPBY函数和Join/hSTACK/CONCAT.但在try 使用LazyFrame时似乎遇到了问题.

谢谢

生成数据的步骤如下:

import polars as pl

lf = pl.DataFrame({'id': [2050, 2051, 2050, 2051, 2051],
                    'time': ['2023-05-01',
                             '2023-05-01',
                             '2023-05-02',
                             '2023-05-02',
                             '2023-05-03'],
                   'value': [20, 25, 21, 22, 23]})
lf = lf.with_columns(pl.col("time").str.to_datetime("%Y-%m-%d"))

推荐答案

你应该以此为中心；

In [29]: lf.pivot(columns='id', values='value', index='time', aggregate_function=None)
Out[29]:
shape: (3, 3)
┌─────────────────────┬──────┬──────┐
│ time                ┆ 2050 ┆ 2051 │
│ ---                 ┆ ---  ┆ ---  │
│ datetime[μs]        ┆ i64  ┆ i64  │
╞═════════════════════╪══════╪══════╡
│ 2023-05-01 00:00:00 ┆ 20   ┆ 25   │
│ 2023-05-02 00:00:00 ┆ 21   ┆ 22   │
│ 2023-05-03 00:00:00 ┆ null ┆ 23   │
└─────────────────────┴──────┴──────┘