Python 在组对象上应用 vs 变换

发布于11月07日

考虑下面的数据文件:

columns = ['A', 'B', 'C', 'D']
records = [
    ['foo', 'one', 0.162003, 0.087469],
    ['bar', 'one', -1.156319, -1.5262719999999999],
    ['foo', 'two', 0.833892, -1.666304],     
    ['bar', 'three', -2.026673, -0.32205700000000004],
    ['foo', 'two', 0.41145200000000004, -0.9543709999999999],
    ['bar', 'two', 0.765878, -0.095968],
    ['foo', 'one', -0.65489, 0.678091],
    ['foo', 'three', -1.789842, -1.130922]
]
df = pd.DataFrame.from_records(records, columns=columns)

"""
     A      B         C         D
0  foo    one  0.162003  0.087469
1  bar    one -1.156319 -1.526272
2  foo    two  0.833892 -1.666304
3  bar  three -2.026673 -0.322057
4  foo    two  0.411452 -0.954371
5  bar    two  0.765878 -0.095968
6  foo    one -0.654890  0.678091
7  foo  three -1.789842 -1.130922
"""

以下命令起作用:

df.groupby('A').apply(lambda x: (x['C'] - x['D']))
df.groupby('A').apply(lambda x: (x['C'] - x['D']).mean())

但以下工作都没有:

df.groupby('A').transform(lambda x: (x['C'] - x['D']))
# KeyError or ValueError: could not broadcast input array from shape (5) into shape (5,3)

df.groupby('A').transform(lambda x: (x['C'] - x['D']).mean())
# KeyError or TypeError: cannot concatenate a non-NDFrame object

Why? The example on the documentation似乎表明，在组中调用transform可以进行行操作处理:

# Note that the following suggests row-wise operation (x.mean is the column mean)
zscore = lambda x: (x - x.mean()) / x.std()
transformed = ts.groupby(key).transform(zscore)

换句话说，我认为转换本质上是一种特定类型的应用(不聚合).我错在哪里？

作为参考，下面是上面原始数据框的构造:

df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                         'two', 'two', 'one', 'three'],
                   'C' : randn(8), 'D' : randn(8)})

import pandas as pd import numpy as np df = pd.DataFrame({'State':['Texas', 'Texas', 'Florida', 'Florida'], 'a':[4,5,1,3], 'b':[6,10,3,11]}) State a b 0 Texas 4 6 1 Texas 5 10 2 Florida 1 3 3 Florida 3 11

def rand_group_len(x): return np.random.rand(len(x)) df.groupby('State').transform(rand_group_len) a b 0 0.962070 0.151440 1 0.440956 0.782176 2 0.642218 0.483257 3 0.056047 0.238208

Python 在组对象上应用 vs 变换

推荐答案

Two major differences between `apply` and `transform`

判断自定义函数

例子

显示传递的对象

Transform必须返回与组大小相同的一维序列

Returning a single scalar object also works for `transform`

Python相关问答推荐

Gekko解算器错误results.json未找到，无法找出原因

两极按组颠倒顺序

请从Python访问kivy子部件的功能需要帮助

遵循轮廓中对象方向的计算线

Python中的函数中是否有充分的理由接受float而不接受int？

从包含数字和单词的文件中读取和获取数据集

pandas DataFrame GroupBy.diff函数的意外输出

如何使用symy打印方程？

Pystata：从Python并行运行stata实例

max_of_three使用First_select、second_select、

删除任何仅包含字符(或不包含其他数字值的邮政编码)的观察

ModuleNotFound错误：没有名为flags.State的模块; flags不是包

在Polars(Python库)中将二进制转换为具有非UTF-8字符的字符串变量

' osmnx.shortest_track '返回有效源 node 和目标 node 的'无'

如何保持服务器发送的事件连接活动？

CommandeError：模块numba没有属性generated_jit''''

网格基于1.Y轴与2.x轴显示在matplotlib中

将标签移动到matplotlib饼图中楔形块的开始处

在pandas/python中计数嵌套类别

在matplotlib中使用不同大小的标记顶部添加批注