我试图弄清楚一个解决方案,以便在pandas收件箱中查找和填充额外列,以提高可读性.
以下数据(截断)可用,其中manager_id填充了相应的user_id
last_name | first_name | user_id | manager_id |
---|---|---|---|
scorsese | martin | 1 | 2 |
wenders | wim | 2 | 2 |
kurosawa | akira | 3 | 3 |
sabu | sabu | 4 | 3 |
结果应该是:
last_name | first_name | user_id | manager_id | manager_name |
---|---|---|---|---|
scorsese | martin | 1 | 2 | wim wenders |
wenders | wim | 2 | 2 | wim wenders |
kurosawa | akira | 3 | 3 | akira kurosawa |
sabu | sabu | 4 | 3 | akira kurosawa |
到目前为止,我一直在努力寻找一个简洁而好的解决方案,只使用基于Pandas 的方法.我有一个可行的解决方案,但这是一个肮脏的黑客,迭代同一个收件箱的字典,并根据名称等查找行索引.非常丑陋.
dictionary_of_kantoku = df_kantoku.to_dict(orient="records")
for kantoku in dictionary_of_kantoku:
row_index = df_kantoku.loc[
(df_kantoku['last_name'].str.contains(kantoku['last_name'])
& df_kantoku['first_name'].str.contains(kantoku['first_name']))].index[0]
manager_id = df_kantoku[(df_kantoku['last_name'].str.contains(kantoku['last_name'])
& df_kantoku['first_name'].str.contains(kantoku['first_name']))]['manager_id'].values[0]
manager_name = df_kantoku[df_kantoku['user_id'] == manager_id]['first_name'].values[0] + ' ' + df_kantoku[df_kantoku['user_id'] == manager_id]['last_name'].values[0]
if row_index != 0:
resultset.loc[row_index, 'manager_name'] = manager_name
有人能解释一下如何在没有字典黑客和迭代的情况下高效地完成这件事吗?
非常感谢.