Python3.x 使用PANAS根据另两个列表中的值对一个列表中的字符串值进行分组

发布于02月22日

我有三份 list :

a = ['AFM_123_H2O_56', '345_FM_CO2', 'H6C6_AFM_test', 'dio_CO2_FM', 'check_H2O_FM', 'sample_FM_H6C6', 'AFM_67_H2O']
condition1 = ['H2O', 'CO2', 'H6C6']
condition2 = ['FM', 'AFM']

输出应如下所示:

c = [['AFM_123_H2O_56', 'AFM_67_H2O'], #all strings containing H2O and AFM
      ['check_H2O_FM'],                #all strings containing H2O and FM
      ['345_FM_CO2', 'dio_CO2_FM'],    #all strings containing CO2 and FM
      ['H6C6_AFM_test'],               #all strings containing H6C6 and AFM
      ['sample_FM_H6C6']]              #all strings containing H6C6 and FM

或者，如果使用Pandas 数据帧，则输出应如下所示:

H2O_AFM       ['AFM_123_H2O_56', 'AFM_67_H2O']
H2O_FM        ['check_H2O_FM']
CO2_FM        ['345_FM_CO2', 'dio_CO2_FM']
H6C6_AFM      ['H6C6_AFM_test']
H6C6_FM       ['sample_FM_H6C6']

我需要根据"condition1"和"condition2"列表中的值对"a"列表中的元素进行分组，并将结果保存到第三个列表中.我知道如何使用for循环，但我想使用pandas可能是更好的解决方案.

我只知道在一种情况下怎么做:

pattern = '(%s)' % '|'.join(map(re.escape, condition1))
series_files = pd.Series(a)
df_grouped_files = series_files.groupby(series_files.str.extract(pattern, expand=False), sort=False).agg(list)

但我不知道如何从两个列表条件来计算叉积.

import re, itertools a = ['AFM_123_H2O_56', '345_FM_CO2', 'H6C6_AFM_test', 'dio_CO2_FM', 'check_H2O_FM', 'sample_FM_H6C6', 'AFM_67_H2O'] condition1 = ['H2O', 'CO2', 'H6C6'] condition2 = ['FM', 'AFM'] c = { f'{c1}_{c2}' : [s for s in a if re.search(fr'(^|_){c1}(_|$)', s) and re.search(fr'(^|_){c2}(_|$)', s)] for c1, c2 in itertools.product(condition1, condition2) }

{ 'H2O_FM': ['check_H2O_FM'], 'H2O_AFM': ['AFM_123_H2O_56', 'AFM_67_H2O'], 'CO2_FM': ['345_FM_CO2', 'dio_CO2_FM'], 'CO2_AFM': [], 'H6C6_FM': ['sample_FM_H6C6'], 'H6C6_AFM': ['H6C6_AFM_test'] }

0 1 0 H2O_FM [check_H2O_FM] 1 H2O_AFM [AFM_123_H2O_56, AFM_67_H2O] 2 CO2_FM [345_FM_CO2, dio_CO2_FM] 3 CO2_AFM [] 4 H6C6_FM [sample_FM_H6C6] 5 H6C6_AFM [H6C6_AFM_test]

Python3.x 使用PANAS根据另两个列表中的值对一个列表中的字符串值进行分组

推荐答案

Python-3.x相关问答推荐

如何在matplotlib中显示次要刻度标签

将字符串转换为python日期时间时出错

如何使用魔杖扭曲图像

我用Kivy创建的应用程序在安卓系统上运行时出错.(attributeerror：'；class'；对象没有属性'；_javaclass__cls_storage'；)

添加任意数量的 pandas 数据框

你能骗PIP 让它相信包已经安装了吗

Pandas groupby 然后 for each 组添加新行

在 Python 中实现 COM 接口

转换Pandas 数据框 - 添加行

Pandas 窗口聚合两个排序表

Pandas matplotlib：条形图占总数的百分比

如果原始字符串包含正斜杠，如何返回具有不同可能性的新字符串

如何在带有 GUI 的 python 游戏中设置回答时间限制？

如何在python中将列表转换为其他格式

如何在 VSCode 的在 Cloud Run Emulator 上运行/调试构建设置中添加 SQL 连接

`pyspark mllib` 与 `pyspark ml` 包

如何使用 d.items() 更改 for 循环中的所有字典键？

在没有时间的python中创建日期

如何为 anaconda python3 安装 gi 模块？

如何在 Pandas 中的超 Big Data 框上创建数据透视表