在Polars版本0.20.7
之前,如果为columns
参数给定多个值,则pivot()
方法将基于index
列对columns
individually中的每个列应用聚合逻辑,而不是对一组集合列应用聚合逻辑.
之前:
df = pl.DataFrame(
{
"foo": ["one", "one", "two", "two", "one", "two"],
"bar": ["y", "y", "y", "x", "x", "x"],
"biz": ['m', 'f', 'm', 'f', 'm', 'f'],
"baz": [1, 2, 3, 4, 5, 6],
}
)
df.pivot(index='foo', values='baz', columns=('bar', 'biz'), aggregate_function='sum')
退货:
shape: (2, 5)
┌─────┬─────┬─────┬─────┬─────┐
│ foo ┆ y ┆ x ┆ m ┆ f │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╪═════╡
│ one ┆ 3 ┆ 5 ┆ 6 ┆ 2 │
│ two ┆ 3 ┆ 10 ┆ 3 ┆ 10 │
└─────┴─────┴─────┴─────┴─────┘
之后(单位:0.20.7
):
shape: (2, 5)
┌─────┬───────────┬───────────┬───────────┬───────────┐
│ foo ┆ {"y","m"} ┆ {"y","f"} ┆ {"x","f"} ┆ {"x","m"} │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═══════════╪═══════════╪═══════════╪═══════════╡
│ one ┆ 1 ┆ 2 ┆ null ┆ 5 │
│ two ┆ 3 ┆ null ┆ 10 ┆ null │
└─────┴───────────┴───────────┴───────────┴───────────┘
我更喜欢以前的功能;处理新的透视表非常笨拙,特别是考虑到它的列名.Polars开发人员将这一更改放在"错误修复"下,但它实际上 destruct 了我的代码.