我有一个很大的DataFrame(名为:Complete)数据(只有两列).我只想使用完整的单词来过滤它,而不是子字符串.例:
complete dataframe:个
comment | sentiment |
---|---|
fast running | 0.9 |
heavily raining | 0.5 |
in the house | 0.1 |
coming in | 0.0 |
rubbing it | -0.5 |
如果我设置一个子字符串来筛选我的表:
substring = 'in'
comp = complete[complete.apply(lambda row: row.astype(str).str.contains(substring, case=False).any(), axis=1)]
output comp:个
comment | sentiment |
---|---|
fast running | 0.9 |
heavily raining | 0.5 |
in the house | 0.1 |
coming in | 0.0 |
rubbing it | -0.5 |
它返回相同的df,因为所有单词都有"in"作为子字符串.
My desired output:个
comment | sentiment |
---|---|
in the house | 0.1 |
coming in | 0.0 |
仅当子字符串是单词而不是子字符串时才对其进行筛选.
如何才能做到这一点?