我有两个数据框架,需要按列比较这两个数据框架,并将结果附加到旁边.
DF 1:
Claim_number | Claim_Status |
---|---|
1001 | Closed |
1002 | In Progress |
DF 2:
Claim_number | Claim_Status |
---|---|
1001 | Closed |
1002 | Open |
预期yields :
DF 3:
Claim_number_DF1 | Claim_number_DF2 | Comparison_of_Claim_number | Claim_status_DF1 | Claim_status_DF2 | Comparison_of_Claim_Status |
---|---|---|---|---|---|
1001 | 1001 | TRUE | Closed | Closed | TRUE |
1002 | 1002 | TRUE | In Progress | Open | FALSE |
下面的代码正在工作,但它会抛出性能警告"Performance Warning:DataFrame高度碎片化. 这通常是多次调用' frame.insert '的结果,性能较差. 请考虑使用pd.concat(轴=1)一次连接所有列. 要获取碎片整理的帧,请使用' newframe = frame.Copy()'
代码:
i = 0
df_mismatch = pd.DataFrame()
while i < len(DF1.columns):
df_mismatch[f'{col_list[i]}_dev'] = DF1[Df1.columns[i]]
df_mismatch[f'{col_list[i]}_test'] = Df2[Df2.columns[i]]
df_mismatch[f'comparison_of_{col_list[i]}'] = np.where(
(Df1[Df1.columns[i]] == Df2[Df2.columns[i]]), True, False)
i = i+1