我有这个数据框:

import pandas as pd
df = pd.DataFrame({'subject': ['a', 'a', 'b', 'b', 'c', 'd'],
 'level': ['hard', None, None, 'easy', None, 'medium']})

print(df)

  subject   level
0       a    hard
1       a    None
2       b    None
3       b    easy
4       c    None
5       d  medium

使用代码时:

df.groupby('subject').transform(lambda group: print(group))

I got four printed groups. That's ok because we have four subjects : a, b, c and d
But I don't understand the group 2, i feel like transform have accumulated the values of the two first groups. Also, there is a weird indentation that seem to separate the first group from the second one

# ------------------------ group1
0    hard
1    None
Name: level, dtype: object
# ------------------------ group2
  level
0  hard
1  None
2    None
3    easy
Name: level, dtype: object
# ------------------------ group3
4    None
Name: level, dtype: object
# ------------------------ group4
5    medium
Name: level, dtype: object

有人能给我解释一下其中的逻辑吗?

推荐答案

不是这样的,但是transform会运行一些判断来查看输出的类型.通常,您不会将transform用于副作用(您应该使用apply,如后面所示),而是返回与输入形状相同的内容.

使用自定义函数可能会更明确地说明究竟发生了什么:

def f(group):
    print('---')
    print(group.name)  # with `transform` this shouldn't give the group name
    print(group)
    print('===')
    
df.groupby('subject').transform(f)

输出:

---                           # first group
level
0    hard
1    None
Name: level, dtype: object
===
---                           # internal pandas check (not a real group)
a
  level
0  hard
1  None
===
---                           # second group
level
2    None
3    easy
Name: level, dtype: object
===
---                           # third group
level
4    None
Name: level, dtype: object
===
---                           # fourth group
level
5    medium
Name: level, dtype: object
===

相比之下,使用apply会给出组名,您可以将其用于此类操作:

df.groupby('subject').apply(f)

---
a
  subject level
0       a  hard
1       a  None
===
---
b
  subject level
2       b  None
3       b  easy
===
---
c
  subject level
4       c  None
===
---
d
  subject   level
5       d  medium
===

don't use transform to manually work on groups.

这是另一个例子.在transform中,group.name返回当前的Series名称,请查看多个列的情况:

df = pd.DataFrame({'subject': ['a', 'a', 'b', 'b', 'c', 'd'],
                   'level': ['hard', None, None, 'easy', None, 'medium'],
                   'level2': ['hard', None, None, 'easy', None, 'medium']
                  })
df.groupby('subject').transform(lambda g: print(g.name))

print输出:

level    # first group, column "level"
level2   # first group, column "level2"
a        # some internal check run only once
level    # second group, column "level"
level2   # second group, column "level2"
level    # etc.
level2
level
level2

相比之下,apply会将每个组作为DataFrame返回:

df.groupby('subject').apply(lambda g: print(g.name))

a
b
c
d

Python相关问答推荐

使用regex分析具有特定字符的字符串(如果它们存在)

添加包含中具有任何值的其他列的计数的列

如何计算两极打印机中 * 所有列 * 的出现次数?

如何将双框框列中的成对变成两个新列

如何删除索引过go 的lexsort深度可能会影响性能?' &>

Python中的嵌套Ruby哈希

如何使用根据其他值相似的列从列表中获取的中间值填充空NaN数据

Python键入协议默认值

如何请求使用Python将文件下载到带有登录名的门户网站?

使用groupby方法移除公共子字符串

启动带有参数的Python NTFS会导致文件路径混乱

Pandas Data Wrangling/Dataframe Assignment

如何排除prefecture_related中查询集为空的实例?

Pandas:计算中间时间条目的总时间增量

合并与拼接并举

Js的查询结果可以在PC Chrome上显示,但不能在Android Chrome、OPERA和EDGE上显示,而两者都可以在Firefox上运行

需要帮助使用Python中的Google的People API更新联系人的多个字段'

裁剪数字.nd数组引发-ValueError:无法将空图像写入JPEG

如何编辑此代码,使其从多个EXCEL文件的特定工作表中提取数据以显示在单独的文件中

在round函数中使用列值