I have the following dataframe df, which comes from a dataset:

    Rk  Player  Pos Age Tm  G   GS  MP  FG  FGA ... FT% ORB DRB TRB AST STL BLK TOV PF  PS/G
0   1   Stephen Curry   PG  27  GSW 79  79  34.2    10.2    20.2    ... 0.908   0.9 4.6 5.4 6.7 2.1 0.2 3.3 2.0 30.1
1   2   James Harden    SG  26  HOU 82  82  38.1    8.7 19.7    ... 0.860   0.8 5.3 6.1 7.5 1.7 0.6 4.6 2.8 29.0
2   3   Kevin Durant    SF  27  OKC 72  72  35.8    9.7 19.2    ... 0.898   0.6 7.6 8.2 5.0 1.0 1.2 3.5 1.9 28.2
3   4   DeMarcus Cousins    C   25  SAC 65  65  34.6    9.2 20.5    ... 0.718   2.4 9.1 11.5    3.3 1.6 1.4 3.8 3.6 26.9
4   5   LeBron James    SF  31  CLE 76  76  35.6    9.7 18.6    ... 0.731   1.5 6.0 7.4 6.8 1.4 0.6 3.3 1.9 25.3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
471 472 Joe Harris  SG  24  CLE 5   0   3.0 0.2 0.8 ... NaN 0.0 0.6 0.6 0.4 0.0 0.0 0.2 0.2 0.6
472 473 Bruno Caboclo   SF  20  TOR 6   1   7.2 0.2 2.0 ... NaN 0.2 0.2 0.3 0.2 0.3 0.2 0.7 0.3 0.5
473 474 Sam Dekker  SF  21  HOU 3   0   2.0 0.0 0.0 ... NaN 0.0 0.3 0.3 0.0 0.3 0.0 0.0 0.0 0.0
474 475 J.J. O'Brien    SF  23  UTA 2   0   3.0 0.0 0.5 ... NaN 0.0 0.5 0.5 0.0 0.5 0.0 0.0 0.5 0.0
475 476 Nate Robinson   PG  31  NOP 2   1   11.5    0.0 0.5 ... NaN 0.0 0.0 0.0 2.0 0.5 0.0 0.0 2.5 0.0

I need to group df by teams (Tm), find the best average scorer(s) per team (PS/G), ignoring the Tm: TOT row. Sort descending by points per game with ties broken by team name (Tm). If there are multiple top scorers, list both and sort them by player name ascending.

What I have done is the following:

grouped = df[df['Tm']!="TOT"].groupby('Tm')['PS/G'].max().sort_values(ascending=False)

And I am getting:

Tm
GSW    30.1
HOU    29.0
OKC    28.2
SAC    26.9
CLE    25.3
POR    25.1
NOP    24.3
TOR    23.5
IND    23.1
BOS    22.2
NYK    21.8
LAC    21.4
SAS    21.2
CHI    20.9
CHO    20.9
MIN    20.7
BRK    20.6
PHO    20.4
WAS    19.9
UTA    19.7
DEN    19.5
MIA    19.1
DET    18.8
DAL    18.3
ORL    18.2
MIL    18.2
LAL    17.6
PHI    17.5
ATL    17.1
MEM    16.6
Name: PS/G, dtype: float64

However, I need to include also the Player column in the result. So my first question is how can I achieve that?

My second question is how to include these two requirements:

  1. with ties broken by team name (Tm).
  2. If there are multiple top scorers, list both and sort them by player name ascending.

推荐答案

I finally managed to figure it out with the following:

grouped1 = df.loc[df[df['Tm']!="TOT"].groupby(['Tm'])['PS/G'].idxmax()].sort_values(by=['PS/G', 'Player'], ascending=[0,1]).reset_index()

grouped_final = grouped1[['Tm', 'Player', 'PS/G']]

Python相关问答推荐

类型错误:输入类型不支持ufuncisnan-在执行Mann-Whitney U测试时[SOLVED]

. str.替换pandas.series的方法未按预期工作

pyscript中的压痕问题

Streamlit应用程序中的Plotly条形图中未正确显示Y轴刻度

梯度下降:简化要素集的运行时间比原始要素集长

实现自定义QWidgets作为QTimeEdit的弹出窗口

为什么np. exp(1000)给出溢出警告,而np. exp(—100000)没有给出下溢警告?

在pandas数据框中计算相对体积比指标,并添加指标值作为新列

Gunicorn无法启动Flask应用,因为无法将应用解析为属性名或函数调用.'"'' "

PYTHON、VLC、RTSP.屏幕截图不起作用

如何使用正则表达式修改toml文件中指定字段中的参数值

BeautifulSoup-Screper有时运行得很好,很健壮--但有时它失败了::可能这里需要一些更多的异常处理?

如何在Django模板中显示串行化器错误

将字节序列解码为Unicode字符串

查找查找表中存在的列值组合

具有不同坐标的tkinter canvs.cocords()和canvs.moveto()

按列表分组到新列中

将索引表转换为Numy数组

了解如何让库认识到我具有所需的依赖项

Pandas查找给定时间戳之前的最后一个值