我想用整数编码这non-zero
个二进制事件.下面是一个演示表:
import polars as pl
df = pl.DataFrame(
{
"event": [0, 1, 1, 0],
"foo": [1, 2, 3, 4],
"boo": [2, 3, 4, 5],
}
)
预期yields 通过以下方式实现:
df = df.with_row_index()
events = df.select(pl.col(["index", "event"])).filter(pl.col("event") == 1).with_row_index("event_id").drop("event")
df = df.join(events, on="index", how="left")
out:
shape: (4, 5)
┌───────┬───────┬─────┬─────┬──────────┐
│ index ┆ event ┆ foo ┆ boo ┆ event_id │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ i64 ┆ i64 ┆ i64 ┆ u32 │
╞═══════╪═══════╪═════╪═════╪══════════╡
│ 0 ┆ 0 ┆ 1 ┆ 2 ┆ null │
│ 1 ┆ 1 ┆ 2 ┆ 3 ┆ 0 │
│ 2 ┆ 1 ┆ 3 ┆ 4 ┆ 1 │
│ 3 ┆ 0 ┆ 4 ┆ 5 ┆ null │
└───────┴───────┴─────┴─────┴──────────┘
我希望得到chaining the expressions的预期输出:
(
df
.with_row_index()
.join(
df
.select(pl.col(["index", "event"]))
.filter(pl.col("event") == 1)
.with_row_index("event_id")
.drop("event"),
on="index",
how="left",
)
)
但是,.join()
表达式中的表达式似乎没有添加df.with_row_index()
运算中的index
列:
ColumnNotFoundError: index
Error originated just after this operation:
DF ["event", "foo", "boo"]; PROJECT */3 COLUMNS; SELECTION: "None"