Python 删除 Polars 中 `list[str]` 类型列的重复行

发布于04月15日

我有一个DataFrame，它的列包含字符串列表.我希望筛选DataFrame以删除列表列的重复值的行.

例如,

import polars as pl

# Create a DataFrame with a list[str] type column
data = pl.DataFrame({
    "id": [1, 2, 3, 4],
    "values": [
        ["a", "a", "a"], # first two rows are duplicated
        ["a", "a", "a"],
        ["b", "b", "b"],
        ["c", "d", "e"]
    ]
})

print(data)

shape: (4, 2)
┌─────┬─────────────────┐
│ id  ┆ values          │
│ --- ┆ ---             │
│ i64 ┆ list[str]       │
╞═════╪═════════════════╡
│ 1   ┆ ["a", "a", "a"] │
│ 2   ┆ ["a", "a", "a"] │
│ 3   ┆ ["b", "b", "b"] │
│ 4   ┆ ["c", "d", "e"] │
└─────┴─────────────────┘

预期结果:

shape: (3, 2)
┌─────┬─────────────────┐
│ id  ┆ values          │
│ --- ┆ ---             │
│ i64 ┆ list[str]       │
╞═════╪═════════════════╡
│ 1   ┆ ["a", "a", "a"] │
│ 3   ┆ ["b", "b", "b"] │
│ 4   ┆ ["c", "d", "e"] │
└─────┴─────────────────┘

使用unique方法对类型list[str]不起作用(但是，当List包含数值类型时，它起作用).

data.unique(subset="values")

ComputeError: grouping on list type is only allowed if the inner type is numeric

shape: (3, 2) ┌─────┬─────────────────┐ │ id ┆ values │ │ --- ┆ --- │ │ i64 ┆ list[cat] │ ╞═════╪═════════════════╡ │ 1 ┆ ["a", "a", "a"] │ │ 3 ┆ ["b", "b", "b"] │ │ 4 ┆ ["c", "d", "e"] │ └─────┴─────────────────┘

Python 删除 Polars 中 `list[str]` 类型列的重复行

推荐答案

Python相关问答推荐

在Windows上启动新Python项目的正确步骤顺序

pyautogui.locateOnScreen在Linux上的工作方式有所不同

使用多个性能指标执行循环特征消除

如何使用stride_tricks.as_strided逆转NumPy数组

从包含数字和单词的文件中读取和获取数据集

在Python中对分层父/子列表进行排序

将整组数组拆分为最小值与最大值之和的子数组

不理解Value错误：在Python中使用迭代对象设置时必须具有相等的len键和值

切片包括面具的第一个实例在内的眼镜的最佳方法是什么？

如何在给定的条件下使numpy数组的计算速度最快？

如何将一个动态分配的C数组转换为Numpy数组，并在C扩展模块中返回给Python

python中的解释会在后台调用函数吗？

让函数调用方程

在Python中使用yaml渲染(多行字符串)

如何在Python中使用Iscolc迭代器实现观察者模式？

从源代码显示不同的输出(机器学习)(Python)

简单 torch 模型测试：ModuleNotFoundError：没有名为'；Ultralytics.yolo'；

当HTTP 201响应包含 Big Data 的POST请求时，应该是什么？

为什么Visual Studio Code说我的代码在使用Pandas concat函数后无法访问？

我怎么才能用拉夫分拣呢？