有一个像这样的数据集:
Index | Role | Name | Grade |
---|---|---|---|
1 | Provider | Alex | 7 |
2 | Provider | William | 7.5 |
7 | Provider | Juan | 5.5 |
15 | Provider | Pedro | 4.5 |
25 | Client | George | 8 |
26 | Provider | Mark | 9.4 |
37 | Client | James | 8.1 |
39 | Transporter | Anthony | 9.5 |
50 | Transporter | Jason | 7 |
我正在try 用相同的Role值将连续行分组.我可以用下面的句子来实现这一点:
df = df.groupby((df.Role!= df.Role.shift()).cumsum()).agg(
Role = ('Role', 'first'),
Name = ('Name', ' '.join),
Grade = ('Grade', 'mean')
).reset_index(drop=True)
这将使框架看起来像这样:
Index | Role | Name | Grade |
---|---|---|---|
1 | Provider | Alex William Juan Pedro | 6.125 |
2 | Client | George | 8 |
3 | Provider | Mark | 9.4 |
4 | Client | James | 8.1 |
5 | Transporter | Anthony Jason | 8.25 |
现在我想增加一条新的规则.我只想分组时,该索引与前一行的最大差异为5个单位:
Index | Role | Name | Grade |
---|---|---|---|
1 | Provider | Alex William Juan | 6.67 |
2 | Provider | Pedro | 4.5 |
3 | Client | George | 8 |
4 | Provider | Mark | 9.4 |
5 | Client | James | 8.1 |
6 | Transporter | Anthony | 9.5 |
7 | Transporter | Jason | 7 |
我怎么能做到这一点?此外,如果有任何分组方法更有效,欢迎.