假设这就是您要开始的内容:
df1
label
0 A
1 B
2 C
3 A
4 B
df2
label value
0 C 5
1 B 8
2 C 1
3 B 2
4 A 3
5 A 4
6 B 5
7 A 2
8 B 7
Option 1: Merge on cumcounted key
One easy way to do this is to shuffle df2
, add an incremental key to both dataFrames and then merge:
df3 = df1.assign(key=df1.groupby('label').cumcount())
df4 = (df2.sample(frac=1)
.reset_index(drop=True)
.assign(key=lambda d: d.groupby('label').cumcount()))
df3.merge(df4, how='left', on=['label', 'key']).drop('key', 1)
label value
0 A 2
1 B 5
2 C 1
3 A 3
4 B 8
注:确定性洗牌设置为np.random.seed
Option 2: Sample groups and concat
Another option is to groupby df2, sample groups and concat
counts = df1['label'].value_counts()
pd.concat([g.sample(n=counts[k]) for k, g in df2.groupby('label')])
label value
7 A 2
5 A 4
3 B 2
6 B 5
2 C 1
这里需要注意的是,不会保留顺序.