我正在try 学习一些关于使用极点的知识,所以我有一个简单的问题来测试它,我最终想在数据上运行group_by
运算.
在分析数据时,我通过相加和累积的方法从初始数据创建了几个额外的序列.
我知道,当您想要将新创建的变量与表达式一起使用时,它需要在另一个链接的with_columns
中,但我似乎无法使其工作.
我有以下示例代码,我认为它们应该是正确的,但失败了.代码如下:
import numpy as np
import polars as pl
data = np.random.random((50,5))
df = pl.from_numpy(data, schema=["id", "sampling_time", "area", "val1", "area_corr"])
(df
.with_columns([
pl.col("id").cast(pl.Int32),
pl.Series(name="total_area", values=df.select(pl.col("area") + pl.col("area_corr"))),
])
.with_columns([
pl.Series(name="cumulative_area", values=df.select(pl.cum_sum("total_area")) / 0.15),
])
.with_columns([
pl.Series(name="parcel_id", values=df.select(pl.col("cumulative_area").cast(pl.Int32))),
])
)
但是,代码段失败,并显示以下堆栈跟踪:
Traceback (most recent call last):
File "C:\Users\xxx\anaconda3\envs\py38\lib\site-packages\IPython\core\interactiveshell.py", line 3508, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-8-8e61e84b3c85>", line 7, in <module>
pl.Series(name="cumulative_area", values=df.select(pl.cum_sum("total_area")) / 0.15),
File "C:\Users\xxx\anaconda3\envs\py38\lib\site-packages\polars\dataframe\frame.py", line 8142, in select
return self.lazy().select(*exprs, **named_exprs).collect(_eager=True)
File "C:\Users\xxx\anaconda3\envs\py38\lib\site-packages\polars\lazyframe\frame.py", line 1940, in collect
return wrap_df(ldf.collect())
polars.exceptions.ColumnNotFoundError: total_area
Error originated just after this operation:
DF ["id", "sampling_time", "area", "val1"]; PROJECT */5 COLUMNS; SELECTION: "None"
我不明白为什么新创建的total_area
系列找不到.
我在北极点0.20.7和巨 Python 3.8.18