Python Polars表达式无法访问中间列创建表达式

发布于03月15日

我想用整数编码这non-zero个二进制事件.下面是一个演示表:

import polars as pl
df = pl.DataFrame(
    {
        "event": [0, 1, 1, 0],
        "foo": [1, 2, 3, 4],
        "boo": [2, 3, 4, 5],
    }
)

预期yields 通过以下方式实现:

df = df.with_row_index()
events = df.select(pl.col(["index", "event"])).filter(pl.col("event") == 1).with_row_index("event_id").drop("event")
df = df.join(events, on="index", how="left")

out:
shape: (4, 5)
┌───────┬───────┬─────┬─────┬──────────┐
│ index ┆ event ┆ foo ┆ boo ┆ event_id │
│ ---   ┆ ---   ┆ --- ┆ --- ┆ ---      │
│ u32   ┆ i64   ┆ i64 ┆ i64 ┆ u32      │
╞═══════╪═══════╪═════╪═════╪══════════╡
│ 0     ┆ 0     ┆ 1   ┆ 2   ┆ null     │
│ 1     ┆ 1     ┆ 2   ┆ 3   ┆ 0        │
│ 2     ┆ 1     ┆ 3   ┆ 4   ┆ 1        │
│ 3     ┆ 0     ┆ 4   ┆ 5   ┆ null     │
└───────┴───────┴─────┴─────┴──────────┘

我希望得到chaining the expressions的预期输出:

(
    df
    .with_row_index()
    .join(
        df
        .select(pl.col(["index", "event"]))
        .filter(pl.col("event") == 1)
        .with_row_index("event_id")
        .drop("event"),
        on="index",
        how="left",
        )
    )

但是，.join()表达式中的表达式似乎没有添加df.with_row_index()运算中的index列:

ColumnNotFoundError: index

Error originated just after this operation:
DF ["event", "foo", "boo"]; PROJECT */3 COLUMNS; SELECTION: "None"

shape: (4, 4) ┌───────┬─────┬─────┬──────────┐ │ event ┆ foo ┆ boo ┆ event_id │ │ --- ┆ --- ┆ --- ┆ --- │ │ i64 ┆ i64 ┆ i64 ┆ i64 │ ╞═══════╪═════╪═════╪══════════╡ │ 0 ┆ 1 ┆ 2 ┆ null │ │ 1 ┆ 2 ┆ 3 ┆ 0 │ │ 1 ┆ 3 ┆ 4 ┆ 1 │ │ 0 ┆ 4 ┆ 5 ┆ null │ └───────┴─────┴─────┴──────────┘

Python Polars表达式无法访问中间列创建表达式

推荐答案

Python相关问答推荐

我在使用fill_between()将最大和最小带应用到我的图表中时遇到问题

Pandas 有条件轮班操作

运行总计基于多列pandas的分组和总和

对于一个给定的数字，找出一个整数的最小和最大可能的和

输出中带有南的亚麻神经网络

在Pandas DataFrame操作中用链接替换'方法的更有效方法

C#使用程序从Python中执行Exec文件

通过pandas向每个非空单元格添加子字符串

Pandas：计算中间时间条目的总时间增量

为什么常规操作不以其就地对应操作为基础？

在方法中设置属性值时，如何处理语句不可达[Unreacable]"；的问题？

Polars Group by描述扩展

如果有2个或3个，则从pandas列中删除空格

如何在GEKKO中使用复共轭物

Pandas 数据帧中的枚举，不能在枚举列上执行GROUP BY吗？

解决Geopandas和Altair中的正图和投影问题

Polars定制函数返回多列

一维不匹配两个数组上的广义ufunc

有什么方法可以在不对多索引DataFrame的列进行排序的情况下避免词法排序警告吗？

try 使用RegEx解析由标识多行文本数据的3行头组成的日志(log)文件