Python 在Transformer中使用LabelEncoding的ML模型管道

发布于05月06日

我正在try 将各种转换与LightGBM模型一起整合到scikit-learn管道中.该模型旨在预测二手车的价格.接受培训后，我计划将这个模型集成到HTML页面中以供实际使用.

from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
import joblib

print(numeric_features)
`['car_year', 'km', 'horse_power', 'cyl_capacity']`
print(categorical_features)
`['make', 'model', 'trimlevel', 'fueltype', 'transmission', 'bodytype', 'color']`

# Define transformers for numeric and categorical features
numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())])
categorical_transformer = Pipeline(steps=[('labelencoder', LabelEncoder())])

# Combine transformers using ColumnTransformer
preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)
    ]
)

# Append the LightGBM model to the preprocessing pipeline
pipeline = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('model', best_lgb_model)
])

# Fit the pipeline to training data
pipeline.fit(X_train, y_train)

我在训练时得到的输出是:

LabelEncoder.fit_transform() takes 2 positional arguments but 3 were given

# Convert all non-numeric columns in the DataFrame to category X_train[X_train.select_dtypes(exclude=[np.number]).columns] = X_train.select_dtypes(exclude=[np.number]).apply(lambda x: x.astype('category')) # Define numeric transformer numeric_transformer = Pipeline(steps=[('scaler', StandardScaler())]) # Define preprocessor that uses make_column_selector to select numeric features automatically preprocessor = ColumnTransformer(transformers=[ ('num', numeric_transformer, make_column_selector(dtype_include=np.number)) ]) # Create and configure the LightGBM model with auto categorical feature handling best_lgb_model = lgb.LGBMRegressor(categorical_feature='auto') # Create the full pipeline pipeline = Pipeline(steps=[ ('preprocessor', preprocessor), ('model', best_lgb_model) ]) # Fit the model pipeline.fit(X_train, y_train)

Python 在Transformer中使用LabelEncoding的ML模型管道

推荐答案

Python相关问答推荐

使用Python计算cmyk，在PDF上发现覆盖范围

为什么判断pd.DataFrame的值与判断pd.Series的值存在差异(如果索引中有值)？

在Python中添加期货之间的延迟

从包含基本数据描述的文本字段中识别和检索特定字符序列

如何使用关键参数按列对Pandas rame进行排序

保留包含pandas pandras中文本的列

Polars Dataframe：如何按组删除交替行？

剧作家Python没有得到回应

Pandas 在最近的日期合并，考虑到破产

韦尔福德方差与Numpy方差不同

如何从具有不同len的列表字典中创建摘要表？

运行终端命令时出现问题：pip start anonymous"

如果值不存在，列表理解返回列表

在线条上绘制表面

为什么以这种方式调用pd.ExcelWriter会创建无效的文件格式或扩展名？

Python脚本使用蓝牙运行在Windows 11与raspberry pi4

为什么np. exp(1000)给出溢出警告，而np. exp(—100000)没有给出下溢警告？

Pandas Data Wrangling/Dataframe Assignment

使用BeautifulSoup抓取所有链接

在代码执行后关闭ChromeDriver窗口