我有一个数据帧df1
:-
City | Territory | Region | Area | Target |
---|---|---|---|---|
Chicopee | Springfield MA | Northeast | National | 58761 |
Feeding Hills | Springfield MA | Northeast | National | 65204 |
Feeding Hills | Springfield MA | Northeast | National | 79862 |
Feeding Hills | Springfield MA | Northeast | National | 67247 |
Holyoke | Springfield MA | Northeast | East | 64347 |
Holyoke | Worcester MA | Northeast | East | 73473 |
使用该代码在该水平上用目标的平均值进行插补,我得到:
for col in columns: #columns=['City', 'Territory`,'Region','Area']
avg_tar= df.groupby(col).agg(**{'avg_tar_by_'+col: ('Target', np.mean)})
df = df.merge(avg_tar, on=col)
df = df.drop(columns=columns)
df = df.rename(columns={'avg_tar_by_'+col: col for col in columns})
City | Territory | Region | Area | Target |
---|---|---|---|---|
58761 | 67084.2 | 68149 | 67768.5 | 58761 |
70771 | 67084.2 | 68149 | 67768.5 | 65204 |
70771 | 67084.2 | 68149 | 67768.5 | 79862 |
70771 | 67084.2 | 68149 | 67768.5 | 67247 |
68910 | 67084.2 | 68149 | 68910 | 64347 |
68910 | 73473 | 68149 | 68910 | 73473 |
我有另一个数据帧df2
;我想用df1
中获得的映射值映射df2
的所有列的类别:-
City | Territory | Region | Area | Target |
---|---|---|---|---|
Chicopee | Springfield MA | Northeast | National | 58761 |
Chicopee | Springfield MA | Northeast | East | 65204 |
Feeding Hills | Springfield MA | Northeast | East | 79862 |
Feeding Hills | Worcester MA | Northeast | East | 67247 |
Feeding Hills | Worcester MA | Northeast | East | 64347 |
Holyoke | Worcester MA | Northeast | East | 73473 |
预期输出:
City | Territory | Region | Area | Target |
---|---|---|---|---|
58761 | 67084.2 | 68149 | 67768.5 | 58761 |
58761 | 67084.2 | 68149 | 67768.5 | 65204 |
70771 | 67084.2 | 68149 | 67768.5 | 79862 |
70771 | 73473 | 68149 | 68910 | 67247 |
70771 | 73473 | 68149 | 68910 | 64347 |
68910 | 73473 | 68149 | 68910 | 73473 |