我有一个DataFrame,它的列包含字符串列表.我希望筛选DataFrame以删除列表列的重复值的行.
例如,
import polars as pl
# Create a DataFrame with a list[str] type column
data = pl.DataFrame({
"id": [1, 2, 3, 4],
"values": [
["a", "a", "a"], # first two rows are duplicated
["a", "a", "a"],
["b", "b", "b"],
["c", "d", "e"]
]
})
print(data)
shape: (4, 2)
┌─────┬─────────────────┐
│ id ┆ values │
│ --- ┆ --- │
│ i64 ┆ list[str] │
╞═════╪═════════════════╡
│ 1 ┆ ["a", "a", "a"] │
│ 2 ┆ ["a", "a", "a"] │
│ 3 ┆ ["b", "b", "b"] │
│ 4 ┆ ["c", "d", "e"] │
└─────┴─────────────────┘
预期结果:
shape: (3, 2)
┌─────┬─────────────────┐
│ id ┆ values │
│ --- ┆ --- │
│ i64 ┆ list[str] │
╞═════╪═════════════════╡
│ 1 ┆ ["a", "a", "a"] │
│ 3 ┆ ["b", "b", "b"] │
│ 4 ┆ ["c", "d", "e"] │
└─────┴─────────────────┘
使用unique
方法对类型list[str]
不起作用(但是,当List包含数值类型时,它起作用).
data.unique(subset="values")
ComputeError: grouping on list type is only allowed if the inner type is numeric