import pandas as pd
df_all = pd.DataFrame(columns=["uid", "a", "b"], data=[["uid1", 12, 15],
["uid2", 13, 16],
["uid3", 14, 17],
["uid4", 15, 18]])
df_additional_info1 = pd.DataFrame(columns=["uid", "c", "d"], data=[["uid1", 12, 15],
["uid3", 14, 17]])
df_additional_info2 = pd.DataFrame(columns=["uid", "c", "d"], data=[["uid2", 12, 15]])
我需要合并df_all两次与额外的信息.首先使用df_additional_info1,然后使用df_additional_info2,依此类推.They will always contain additional info for already existing rows and only for those rows that were not yet updated.
当我执行以下操作时:
df_all = df_all.merge(df_additional_info1, how="left", on="uid")
df_all = df_all.merge(df_additional_info2, how="outer", on="uid")
我得到重复的列(*_x,*_y):
有什么建议吗?