我建议建立一本参考词典,将邻居与邻居组相匹配.假设这是原始数据帧:
import pandas as pd
data = {'resident': {0: 'John', 1: 'Mae', 2: 'Richard', 3: 'Clark', 4: 'Claire', 5: 'Susan'}, 'neighbourhoodgroup': {0: 'Brooklyn', 1: 'Brooklyn', 2: 'Manhattan', 3: 'Manhattan', 4: None, 5: None}, 'neighbourhood': {0: 'ClintonHill', 1: 'ClintonHill', 2: 'EastHarlem', 3: 'UpperWestSide', 4: 'ClintonHill', 5: 'EastHarlem'}}
df = pd.DataFrame(data)
'''
resident neighbourhoodgroup neighbourhood
0 John Brooklyn ClintonHill
1 Mae Brooklyn ClintonHill
2 Richard Manhattan EastHarlem
3 Clark Manhattan UpperWestSide
4 Claire None ClintonHill
5 Susan None EastHarlem
'''
首先创建参考词典reference
,其具有来自‘Neighbhood’列的关键字和来自‘Neighborhood HoodGroup’列的值.
df_ref = df.dropna().drop_duplicates(['neighbourhoodgroup', 'neighbourhood'])
reference = {}
for k, v in list(zip(df_ref.neighbourhood, df_ref.neighbourhoodgroup)):
reference[k] = v
'''
{'ClintonHill': 'Brooklyn',
'EastHarlem': 'Manhattan',
'UpperWestSide': 'Manhattan'}
'''
接下来,将字典引用应用于数据帧
df['result'] = df.neighbourhood.apply(lambda x: reference[x])
print(df)
'''
resident neighbourhoodgroup neighbourhood result
0 John Brooklyn ClintonHill Brooklyn
1 Mae Brooklyn ClintonHill Brooklyn
2 Richard Manhattan EastHarlem Manhattan
3 Clark Manhattan UpperWestSide Manhattan
4 Claire None ClintonHill Brooklyn
5 Susan None EastHarlem Manhattan
'''