I have the following dataframe df, which comes from a dataset:

    Rk  Player  Pos Age Tm  G   GS  MP  FG  FGA ... FT% ORB DRB TRB AST STL BLK TOV PF  PS/G
0   1   Stephen Curry   PG  27  GSW 79  79  34.2    10.2    20.2    ... 0.908   0.9 4.6 5.4 6.7 2.1 0.2 3.3 2.0 30.1
1   2   James Harden    SG  26  HOU 82  82  38.1    8.7 19.7    ... 0.860   0.8 5.3 6.1 7.5 1.7 0.6 4.6 2.8 29.0
2   3   Kevin Durant    SF  27  OKC 72  72  35.8    9.7 19.2    ... 0.898   0.6 7.6 8.2 5.0 1.0 1.2 3.5 1.9 28.2
3   4   DeMarcus Cousins    C   25  SAC 65  65  34.6    9.2 20.5    ... 0.718   2.4 9.1 11.5    3.3 1.6 1.4 3.8 3.6 26.9
4   5   LeBron James    SF  31  CLE 76  76  35.6    9.7 18.6    ... 0.731   1.5 6.0 7.4 6.8 1.4 0.6 3.3 1.9 25.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
471 472 Joe Harris  SG  24  CLE 5   0   3.0 0.2 0.8 ... NaN 0.0 0.6 0.6 0.4 0.0 0.0 0.2 0.2 0.6
472 473 Bruno Caboclo   SF  20  TOR 6   1   7.2 0.2 2.0 ... NaN 0.2 0.2 0.3 0.2 0.3 0.2 0.7 0.3 0.5
473 474 Sam Dekker  SF  21  HOU 3   0   2.0 0.0 0.0 ... NaN 0.0 0.3 0.3 0.0 0.3 0.0 0.0 0.0 0.0
474 475 J.J. O'Brien    SF  23  UTA 2   0   3.0 0.0 0.5 ... NaN 0.0 0.5 0.5 0.0 0.5 0.0 0.0 0.5 0.0
475 476 Nate Robinson   PG  31  NOP 2   1   11.5    0.0 0.5 ... NaN 0.0 0.0 0.0 2.0 0.5 0.0 0.0 2.5 0.0

I need to group df by teams (Tm), find the best average scorer(s) per team (PS/G), ignoring the Tm: TOT row. Sort descending by points per game with ties broken by team name (Tm). If there are multiple top scorers, list both and sort them by player name ascending.

What I have done is the following:

grouped = df[df['Tm']!="TOT"].groupby('Tm')['PS/G'].max().sort_values(ascending=False)

And I am getting:

Tm
GSW    30.1
HOU    29.0
OKC    28.2
SAC    26.9
CLE    25.3
POR    25.1
NOP    24.3
TOR    23.5
IND    23.1
BOS    22.2
NYK    21.8
LAC    21.4
SAS    21.2
CHI    20.9
CHO    20.9
MIN    20.7
BRK    20.6
PHO    20.4
WAS    19.9
UTA    19.7
DEN    19.5
MIA    19.1
DET    18.8
DAL    18.3
ORL    18.2
MIL    18.2
LAL    17.6
PHI    17.5
ATL    17.1
MEM    16.6
Name: PS/G, dtype: float64

However, I need to include also the Player column in the result. So my first question is how can I achieve that?

My second question is how to include these two requirements:

  1. with ties broken by team name (Tm).
  2. If there are multiple top scorers, list both and sort them by player name ascending.

推荐答案

I finally managed to figure it out with the following:

grouped1 = df.loc[df[df['Tm']!="TOT"].groupby(['Tm'])['PS/G'].idxmax()].sort_values(by=['PS/G', 'Player'], ascending=[0,1]).reset_index()

grouped_final = grouped1[['Tm', 'Player', 'PS/G']]

Python相关问答推荐

如何基于文件夹的 FTP 树创建 Python 嵌套字典

在 OpenCV 中,你能显示 x、y 和 rgb 值吗?

不附加可迭代对象时可以使用列表理解吗

将 32 位 TIFF 转换为 8 位 TIFF,同时在 python 中保留元数据和标签?

从 3D 图像的每个像素中减去 2D 数组,得到 4D 数组

PyQt5 qlistWidget 拒绝在 Qt Designer 中变小

使用 NumPy 和 OpenCV 有效地旋转图像并粘贴到更大的图像中

在 Pandas 操作中try 和除外 - 怎么做?

如何根据索引压缩单个数据帧

tensorflow keras RandomForestModel get_config() 为空

以最有效的方式比较两个 pandas DataFrame

具有固定 .keys() 内容和可变参数的 Python 字典

Python 中的生成器效率

使用值列表编写字典时出现 DictWriter 问题

Pandas groupby agg 和计算函数一起

从具有重复嵌套模式的文本文件中提取文本

一次替换多个字符

无法抓取网页

OSX 上的 psycopg2 没有安装 pip

Pandas DataFrame 中的滚动半方差