对于数据集df
,我想按列B
中foo
和bar
的两组进行分组,并标识这两组中存在的重复行.我怎样才能做到这一点呢?
df = pd.DataFrame({'A': [1, 2, 2, 3, 3, 1],
'B': ['foo', 'bar', 'foo', 'bar', 'foo', 'foo']})
df = df.sort_values('B')
df
Out[15]:
A B
1 2 bar
3 3 bar
0 1 foo
2 2 foo
4 3 foo
5 1 foo
预期结果:
A B Indicator
1 2 bar True # value 2 also present in foo, so returns True
3 3 bar True # value 3 also present in foo, so returns True
0 1 foo False # value 1 only present in foo, so returns False
2 2 foo True # value 2 also present in bar, so returns True
4 3 foo True # value 3 also present in bar, so returns True
5 1 foo False # value 1 only present in foo, so returns False
Updates:个
假设列B
具有more than 2 categories,则样本数据df
如下:
df = pd.DataFrame({'A': [1, 2, 2, 3, 3, 2, 1], 'B': ['foo', 'bar', 'foo', 'bar', 'foo', 'baz', 'baz']})
df = df.sort_values('B')
df
Out[30]:
A B
1 2 bar
3 3 bar
5 2 baz
6 1 baz
0 1 foo
2 2 foo
4 3 foo
在这种情况下,预期结果如下所示:
A B Indicator
1 2 bar True # The value 2 occurs in categories baz, bar, and foo, so returns True.
3 3 bar False # The value 3 only occurs in categories bar and foo, so returns False.
5 2 baz True # The value 2 occurs in categories baz, bar, and foo, so returns True.
6 1 baz False # The value 1 only occurs in categories baz and foo, so returns False.
0 1 foo False # The value 1 only occurs in categories baz and foo, so returns False.
2 2 foo True # The value 2 occurs in categories baz, bar, and foo, so returns True.
4 3 foo False # The value 3 only occurs in categories bar and foo, so returns False.