我有一个问题,我想使用管道(OHE作为预处理和简单的线性回归作为模型)与SHAP工具.
至于数据,以下是我的数据(我使用的是我修改后的共享单车数据集):
bike_data=pd.read_csv("bike_outlier_clean.csv")
bike_data['season']=bike_data.season.astype('category')
bike_data['year']=bike_data.year.astype('category')
bike_data['holiday']=bike_data.holiday.astype('category')
bike_data['workingday']=bike_data.workingday.astype('category')
bike_data['weather_condition']=bike_data.weather_condition.astype('category')
bike_data['season'] = bike_data['season'].map({1:'Spring', 2:'Summer', 3:'Fall', 4: 'Winter'})
bike_data['year'] = bike_data['year'].map({0: 2011, 1: 2012})
bike_data['holiday'] = bike_data['holiday'].map({0: False, 1: True})
bike_data['workingday'] = bike_data['workingday'].map({0: False, 1: True})
bike_data['weather_condition'] = bike_data['weather_condition'].map({1:'Clear', 2:'Mist', 3:'Light Snow/Rain', 4: 'Heavy Snow/Rain'})
bike_data = bike_data[['total_count','season','month','year','weekday','holiday','workingday','weather_condition','humidity','temp','windspeed']]
x = bike_data.drop('total_count', axis=1)
y = bike_data['total_count']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.3, random_state=42)
而对于我的管道
category_columns = list(set(bike_data.columns) - set(bike_data._get_numeric_data().columns))
preprocessor = ColumnTransformer(
transformers=[
('cat', OneHotEncoder(), category_columns)
],
remainder='passthrough'
)
model = LinearRegression()
pipeline = Pipeline(steps=[('preprocessor', preprocessor), ('model', model)])
pipeline.fit(x_train,y_train)
最后,使用kernelSHAP解释器
explainer = shap.KernelExplainer(pipeline.predict, shap.sample(x, 5))
然而,这就是错误发生的地方.
123 # Make a copy so that the feature names are not removed from the original model
124 out = copy.deepcopy(out)
--> 125 out.f.__self__.feature_names_in_ = None
126
127 return out
AttributeError: can't set attribute 'feature_names_in_'
我现在完全不知道该怎么做才能解决这个问题.