Python 包含元素与极轴交集的字符串列表的聚合列

发布于11月25日

我正在try 使用list[str]列聚合数据帧中的一些行.对于每个索引，我需要组中所有列表的交集.我不确定我是不是想多了，但我现在不能提供解决方案.有什么需要帮忙的吗？

import polars as pl    
input_df = pl.DataFrame(
   {"idx": [1,1,2,2,3,3], 
    "values": [["A", "B"], ["B", "C"], ["A", "B"], ["B", "C"], ["A", "B"], ["B", "C"]]
   }
)

output_df = input_df.agg(...)

>>> input_df
shape: (6, 2)
┌─────┬────────────┐
│ idx ┆ values     │
│ --- ┆ ---        │
│ i64 ┆ list[str]  │
╞═════╪════════════╡
│ 1   ┆ ["A", "B"] │
│ 1   ┆ ["B", "C"] │
│ 2   ┆ ["A", "B"] │
│ 2   ┆ ["B", "C"] │
│ 3   ┆ ["A", "B"] │
│ 3   ┆ ["B", "C"] │
└─────┴────────────┘
>>> output_df # Expected output
shape: (3, 2)
┌─────┬───────────┐
│ idx ┆ values    │
│ --- ┆ ---       │
│ i64 ┆ list[str] │
╞═════╪═══════════╡
│ 1   ┆ ["B"]     │
│ 2   ┆ ["B"]     │
│ 3   ┆ ["B"]     │
└─────┴───────────┘

我试过一些东西，但没有成功

>>> input_df.group_by("idx").agg(
  pl.reduce(function=lambda acc, x: acc.list.set_intersection(x), 
     exprs=pl.col("values"))
)
shape: (3, 2)
┌─────┬──────────────────────────┐
│ idx ┆ values                   │
│ --- ┆ ---                      │
│ i64 ┆ list[list[str]]          │
╞═════╪══════════════════════════╡
│ 1   ┆ [["A", "B"], ["B", "C"]] │
│ 2   ┆ [["A", "B"], ["B", "C"]] │
│ 3   ┆ [["A", "B"], ["B", "C"]] │
└─────┴──────────────────────────┘

另一个

>>> input_df.group_by("idx").agg(
   pl.reduce(function=lambda acc, x: acc.list.set_intersection(x), 
   exprs=pl.col("values").explode())
)
shape: (3, 2)
┌─────┬───────────────────┐
│ idx ┆ values            │
│ --- ┆ ---               │
│ i64 ┆ list[str]         │
╞═════╪═══════════════════╡
│ 3   ┆ ["A", "B", … "C"] │
│ 2   ┆ ["A", "B", … "C"] │
│ 1   ┆ ["A", "B", … "C"] │
└─────┴───────────────────┘

shape: (12, 5) ┌────────┬─────┬────────┬───────────┬──────────┐ │ row_nr ┆ idx ┆ values ┆ group_len ┆ num_rows │ │ --- ┆ --- ┆ --- ┆ --- ┆ --- │ │ u32 ┆ i64 ┆ str ┆ u32 ┆ u32 │ ╞════════╪═════╪════════╪═══════════╪══════════╡ │ 0 ┆ 1 ┆ A ┆ 2 ┆ 1 │ │ 0 ┆ 1 ┆ B ┆ 2 ┆ 2 │ # row_nr = [0, 1] │ 1 ┆ 1 ┆ B ┆ 2 ┆ 2 │ │ 1 ┆ 1 ┆ C ┆ 2 ┆ 1 │ │ 2 ┆ 2 ┆ A ┆ 2 ┆ 1 │ │ 2 ┆ 2 ┆ B ┆ 2 ┆ 2 │ # row_nr = [2, 3] │ 3 ┆ 2 ┆ B ┆ 2 ┆ 2 │ │ 3 ┆ 2 ┆ C ┆ 2 ┆ 1 │ │ 4 ┆ 3 ┆ A ┆ 2 ┆ 1 │ │ 4 ┆ 3 ┆ B ┆ 2 ┆ 2 │ # row_nr = [4, 5] │ 5 ┆ 3 ┆ B ┆ 2 ┆ 2 │ │ 5 ┆ 3 ┆ C ┆ 2 ┆ 1 │ └────────┴─────┴────────┴───────────┴──────────┘

(df.with_columns(group_len = pl.count().over("idx")) .with_row_count() .explode("values") .filter( pl.n_unique("row_nr").over("idx", "values") == pl.col("group_len") ) .group_by("idx", maintain_order=True) .agg(pl.col("values").unique()) )

shape: (3, 2) ┌─────┬───────────┐ │ idx ┆ values │ │ --- ┆ --- │ │ i64 ┆ list[str] │ ╞═════╪═══════════╡ │ 1 ┆ ["B"] │ │ 2 ┆ ["B"] │ │ 3 ┆ ["B"] │ └─────┴───────────┘

Python 包含元素与极轴交集的字符串列表的聚合列

推荐答案

Python相关问答推荐

Python tkinter关闭第一个窗口，同时打开第二个窗口

在有限数量的唯一字母的长字符串中，找到包含重复不超过k次的所有唯一字母的最长子字符串

调试回归无法解决我的问题

从单个列创建多个列并按pandas分组

不允许AMBIMA API请求方法

Python在tuple上操作不会通过整个单词匹配

由于NEP 50，向uint 8添加-256的代码是否会在numpy 2中失败？

Deliveryter Notebook -无法在for循环中更新matplotlib情节(保留之前的情节)，也无法使用动画子功能对情节进行动画

max_of_three使用First_select、second_select、

如何在python xsModel库中定义一个可选[December]字段，以产生受约束的SON模式

修复mypy错误-赋值中的类型不兼容(表达式具有类型xxx，变量具有类型yyy)

ThreadPoolExecutor和单个线程的超时

使用密钥字典重新配置嵌套字典密钥名

为一个组的每个子组绘制，

Python列表不会在条件while循环中正确随机化'

Pandas GroupBy可以分成两个盒子吗？

为什么numpy. vectorize调用vectorized函数的次数比vector中的元素要多？

解决调用嵌入式函数的XSLT中表达式的语法移位/归约冲突

Matplotlib中的字体权重

手动设置seborn/matplotlib散点图连续变量图例中显示的值