我有一个包含空值的Pandas数据帧,我想用query
来过滤它
data = {'Title': ['Title1', 'Title2', 'Title3', 'Title4'],
'Subjects': ['Math; Science', 'English; Math', pd.NA, 'English']}
df_test = pd.DataFrame(data)
print(df_test)
# Title Subjects
# 0 Title1 Math; Science
# 1 Title2 English; Math
# 2 Title3 <NA>
# 3 Title4 English
该查询给出了一个错误:
df_test.query('Title.str.startswith("T") and Subjects.str.contains("Math")')
KeyError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/computation/scope.py in resolve(self, key, is_local)
197 if self.has_resolvers:
--> 198 return self.resolvers[key]
199
36 frames KeyError: 'Series_2_0xe00x4a0x2f0xf50x420x7a0x00x0'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
KeyError: 'Series_2_0xe00x4a0x2f0xf50x420x7a0x00x0'
The above exception was the direct cause of the following exception:
UndefinedVariableError Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/pandas/core/computation/scope.py in resolve(self, key, is_local)
209 return self.temps[key]
210 except KeyError as err:
--> 211 raise UndefinedVariableError(key, is_local) from err
212
213 def swapkey(self, old_key: str, new_key: str, new_value=None) -> None:
UndefinedVariableError: name 'Series_2_0xe00x4a0x2f0xf50x420x7a0x00x0' is not defined
与此查询相同:
df_test.query('Title.str.startswith("T") and Subjects.notna() and Subjects.str.contains("Math")')
这给了我想要的结果
df_test[df_test['Subjects'].notna()].query('Title.str.startswith("T") and Subjects.str.contains("Math")')
Title Subjects
0 Title1 Math; Science
1 Title2 English; Math
我想知道这是不是query
个人的限制,或者我做错了什么.
pd.__version__
# '1.5.3'