我有一个列表列表,列表的每个子列表都包含从数据帧过滤文本的关键字.
keywords = [[('tarifa',), ('mantenimiento',), ('mensual',)],
[('tasa',), ('anual',)],
[('seguro',), ('bancaria',)],
[('seguro',), ('generales',)],
[('mi salud',), ('unific',)]]
我以前通过手动键入关键字进行筛选,如下所示:
#for sublist 1:
kw_s = kw_df[kw_df['transaction_description'].str.contains('tarifa') & kw_df['transaction_description'].str.contains('mantenimiento') & kw_df['transaction_description'].str.contains('mensual')]
#for sublist 2:
kw_s = kw_df[kw_df['transaction_description'].str.contains('seguro') & kw_df['transaction_description'].str.contains('generales')]
现在,我必须根据mysql表中配置的关键字进行过滤.因此,我将关键字保存在一个列表列表中,但我不知道如何通过子列表提取关键字来过滤数据帧.
你知道我该怎么做吗?
下面是数据帧的一个示例
user_id reg_id date transaction_description value
kw_df = [[5, 56, Timestamp('2022-01-29 00:00:00'), 'pac c.misalud conv. unificado', 12320.0],
[5, 57, Timestamp('2021-12-19 00:00:00'), 'cargo seguro proteccion bancaria', 31222.0],
[5, 60, Timestamp('2021-04-06 00:00:00'), 'pac sura cia seguros generales', 8657.0],
[5, 178, Timestamp('2022-03-21 00:00:00'), 'cargo seguro proteccion bancaria', 31222.0],
[5, 179, Timestamp('2022-03-01 00:00:00'), 'pac c.misalud conv. unificado', 12320.0],
[5, 182, Timestamp('2022-03-15 00:00:00'), 'pac sura cia seguros generales', 8657.0],
[5, 189, Timestamp('2022-04-21 00:00:00'), 'cargo seguro proteccion bancaria', 31222.0],
[5, 190, Timestamp('2022-04-01 00:00:00'), 'pac c.misalud conv. unificado', 12320.0],
[5, 193, Timestamp('2022-04-15 00:00:00'), 'pac sura cia seguros generales', 8657.0],
[5, 206, Timestamp('2022-05-21 00:00:00'), 'cargo seguro proteccion bancaria', 31222.0],
[5, 256, Timestamp('2022-06-17 00:00:00'), 'cargo seguro proteccion bancaria', 40222.0]]