Python 如何在Polars中创建条件增量列

发布于03月12日

I'd like to create a conditional incremented column in polars.
It should start from 1 and increment only if a certain condition (pl.col('code') == 'L') is met.

import polars as pl
df = pl.DataFrame({'file': ['a.txt','a.txt','a.txt','a.txt','b.txt','b.txt','c.txt','c.txt','c.txt','c.txt','c.txt'],
                   'code': ['X','Y','Z','L','A','A','B','L','C','L','X']
                   })
df.with_columns(pl.int_range(start=1, end=pl.len()+1).over('file').alias('rrr')
                )

这会产生一个简单的无条件增量.但我如何添加条件呢？

推荐答案

我不确定您到底想要哪种输出，但下面是一个仅在满足条件的行上递增计数器的示例，使用cum_sum():

df.with_columns(
    pl.when(pl.col('code') == 'L').then(pl.lit(1)).otherwise(pl.lit(0)).alias('rrr')
).with_columns(
    pl.col('rrr').cum_sum().over('file') + 1
)

┌───────┬──────┬─────┐
│ file  ┆ code ┆ rrr │
│ ---   ┆ ---  ┆ --- │
│ str   ┆ str  ┆ i32 │
╞═══════╪══════╪═════╡
│ a.txt ┆ X    ┆ 1   │
│ a.txt ┆ Y    ┆ 1   │
│ a.txt ┆ Z    ┆ 1   │
│ a.txt ┆ L    ┆ 2   │
│ b.txt ┆ A    ┆ 1   │
│ b.txt ┆ A    ┆ 1   │
│ c.txt ┆ B    ┆ 1   │
│ c.txt ┆ L    ┆ 2   │
│ c.txt ┆ C    ┆ 2   │
│ c.txt ┆ L    ┆ 3   │
│ c.txt ┆ X    ┆ 3   │
└───────┴──────┴─────┘