Python 如何使用数据帧的一列和最后一行对数据帧进行分组

发布于02月16日

这是我的DataFrame:

import pandas as pd 

df = pd.DataFrame(
    {
        'x': ['a', 'b', 'c', 'c', 'e', 'f', 'd', 'a', 'b', 'c', 'c', 'e', 'f', 'd'],
        'y': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'g', 'g', 'g'],
    }
)

这是我想要的输出:

   x  y
0   a  a
1   b  a
2   c  a
3   c  a
7   a  f
8   b  f
9   c  f
10  c  f

    x  y
4   e  b
5   f  b
6   d  b
11  e  g
12  f  g
13  d  g

以下是需要采取的步骤:

(a)Groupby y

b)Groupby最后一行x

基本上，分组是:

df1 = df.groupby('y').filter(lambda g: g.x.iloc[-1] == 'c')
df2 = df.groupby('y').filter(lambda g: g.x.iloc[-1] == 'd')

在这个例子中，我知道我在最后一行中有两个不同的值，它们是c和d，这就是为什么我可以filter个它们，但在我的数据中我不知道这一点.

推荐答案

IIUC，你可以用groupby.transform('last')来生成一条新奇的石斑鱼:

g = df.groupby('y')
last_x = g['x'].transform('last')

for k, group in df.groupby(last_x):
    print(f'group for last x: "{k}"')
    print(group)

一百零二

第一步不带groupby的较快变量，如果y个值形成唯一组:

mapper = df.drop_duplicates('y', keep='last').set_index('y')['x']
last_x = df['y'].map(mapper)

for k, group in df.groupby(last_x):
    print(f'group for last x: "{k}"')
    print(group)

或者:

last_x = df['x'].mask(df['y'].duplicated(keep='last')).bfill()

for k, group in df.groupby(last_x):
    print(f'group for last x: "{k}"')
    print(group)

输出:

group for last x: "c"
    x  y
0   a  a
1   b  a
2   c  a
3   c  a
7   a  f
8   b  f
9   c  f
10  c  f
group for last x: "d"
    x  y
4   e  b
5   f  b
6   d  b
11  e  g
12  f  g
13  d  g

中级last_x:

0     c
1     c
2     c
3     c
4     d
5     d
6     d
7     c
8     c
9     c
10    c
11    d
12    d
13    d
Name: x, dtype: object

generalization

如果您不一定想要最后一个函数，而是一个任意函数，您可以像在示例中那样将lambda传递给transform:

group_x = g['x'].transform(lambda g: g.iloc[-1])

output as a dictionary:

out = dict(list(df.groupby(last_x)))

输出:

{'c':     x  y
      0   a  a
      1   b  a
      2   c  a
      3   c  a
      7   a  f
      8   b  f
      9   c  f
      10  c  f,
 'd':     x  y
      4   e  b
      5   f  b
      6   d  b
      11  e  g
      12  f  g
      13  d  g}