在下面的示例代码中,我希望"description"列包含从drop_transactions
到True
的任何字符串的任何行在我的结果掩码中.据我所知,我的数据帧中的两行都应该返回为True
,但它们不是.
import pandas as pd
drop_transactions = ['CRCARDPMT', 'ONLINE PMT SMART',
'$TRANSFER DUMB BANK']
d = pd.DataFrame(
data={'description':
['ONLINE PMT SMART ID94991 Internet Initiated Transaction-',
'$TRANSFER DUMB BANK ID321 Internet Initiated Transaction-']})
drop_mask = d['description'].str.contains('|'.join(drop_transactions))
drop_mask
0 True
1 False # I want this string to also be True
Name: description, dtype: bool
怀疑美元符号是罪魁祸首,如果我在适当的位置添加美元符号,第一行也会返回False:
drop_transactions = ['CRCARDPMT', '$ONLINE PMT SMART', # Note added dollar
'$TRANSFER DUMB BANK']
d = pd.DataFrame(
data={'description':
['$ONLINE PMT SMART ID94991 Internet Initiated Transaction-', # Note added dollar
'$TRANSFER DUMB BANK ID321 Internet Initiated Transaction-']})
drop_mask = d['description'].str.contains('|'.join(drop_transactions))
drop_mask
0 False
1 False
Name: description, dtype: bool
我不太精通正则表达式,但有谁能帮我理解这里发生了什么?我知道我可以将匹配字符串更改为不查找美元符号,但我想了解为什么会发生这种情况,以确保我不会遇到任何future 的错误.