使用DataFrame.transform
,其中2个掩码乘以Series.eq
,按位OR
—|
链接:
g = df.groupby('decile')['score']
out = df[df['score'].eq(g.transform('min')) | df['score'].eq(g.transform('max'))]
print (out)
id score decile
1 2 0.715189 6
2 3 0.602763 5
3 4 0.544883 4
5 6 0.645894 5
6 7 0.437587 2
9 10 0.383442 1
10 11 0.791725 7
11 12 0.528895 3
12 13 0.568045 4
13 14 0.925597 8
15 16 0.087129 0
16 17 0.020218 0
17 18 0.832620 7
19 20 0.870012 8
20 21 0.978618 9
22 23 0.461479 3
23 24 0.780529 6
24 25 0.118274 1
27 28 0.944669 9
29 30 0.414662 2
Details:
print (df.assign(min=df['score'].eq(g.transform('min')),
max=df['score'].eq(g.transform('max')),
both = lambda x: x['min'] | x['max']))
id score decile min max both
0 1 0.548814 4 False False False
1 2 0.715189 6 True False True
2 3 0.602763 5 True False True
3 4 0.544883 4 True False True
4 5 0.423655 2 False False False
5 6 0.645894 5 False True True
6 7 0.437587 2 False True True
7 8 0.891773 8 False False False
8 9 0.963663 9 False False False
9 10 0.383442 1 False True True
10 11 0.791725 7 True False True
11 12 0.528895 3 False True True
12 13 0.568045 4 False True True
13 14 0.925597 8 False True True
14 15 0.071036 0 False False False
15 16 0.087129 0 False True True
16 17 0.020218 0 True False True
17 18 0.832620 7 False True True
18 19 0.778157 6 False False False
19 20 0.870012 8 True False True
20 21 0.978618 9 False True True
21 22 0.799159 7 False False False
22 23 0.461479 3 True False True
23 24 0.780529 6 False True True
24 25 0.118274 1 True False True
25 26 0.639921 5 False False False
26 27 0.143353 1 False False False
27 28 0.944669 9 True False True
28 29 0.521848 3 False False False
29 30 0.414662 2 True False True
lambda函数的另一个 idea ,但解决方案很慢:
out = df[df.groupby('decile')['score'].transform(lambda x: x.eq(x.min()) | x.eq(x.max()))]
np.random.seed(0)
df = pd.DataFrame({'id' : range(1,30001),
'score' : np.random.uniform(size = 30000)})
df['decile'] = pd.qcut(df['score'], 5000, labels=False)
In [44]: %%timeit
...: g = df.groupby('decile')['score']
...: out = df[df['score'].eq(g.transform('min')) | df['score'].eq(g.transform('max'))]
...:
3.27 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [45]: %%timeit
...: df[df.groupby('decile')['score'].transform(lambda x: x.eq(x.min()) | x.eq(x.max()))]
...:
1.45 s ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)