Python FunctionTransformer & 在管道中创建新列

发布于06月13日

我有一个样本数据:

df = pd.DataFrame(columns=['X1', 'X2', 'X3'], data=[
                                               [1,16,9],
                                               [4,36,16],
                                               [1,16,9],
                                               [2,9,8],
                                               [3,36,15],
                                               [2,49,16],
                                               [4,25,14],
                                               [5,36,17]])

我想在基于x2 ad X3的df中创建两个互补列，并将其包含在管道中.

我正在try 遵循代码:

def feat_comp(x):
 x1 = 100-x
 return x1

pipe_text = Pipeline([('col_test', FunctionTransformer(feat_comp, 'X2',validate=False))])
X = pipe_text.fit_transform(df)

这给了我一个错误:

TypeError: 'str' object is not callable

如何在所选列上应用函数转换器，以及如何在管道中使用它们？

推荐答案

如果我理解正确，您希望在给定列的基础上添加一个新列，例如X2.您需要使用kw_args将此列作为附加参数传递给函数:

import pandas as pd
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline

df = pd.DataFrame(columns=['X1', 'X2', 'X3'], data=[
                                               [1,16,9],
                                               [4,36,16],
                                               [1,16,9],
                                               [2,9,8],
                                               [3,36,15],
                                               [2,49,16],
                                               [4,25,14],
                                               [5,36,17]])

def feat_comp(x, column):
   x[f'100-{column}'] = 100 - x[column]
   return x

pipe_text = Pipeline([('col_test', FunctionTransformer(feat_comp, validate=False, kw_args={'column': 'X2'}))])
pipe_text.fit_transform(df)

结果:

   X1  X2  X3  100-X2
0   1  16   9      84
1   4  36  16      64
2   1  16   9      84
3   2   9   8      91
4   3  36  15      64
5   2  49  16      51
6   4  25  14      75
7   5  36  17      64

(在您的示例中，FunctionTransformer(feat_comp, 'X2',validate=False) X2将是inverse_func，字符串X2不可调用，因此出现错误)