Python Class_weight参数不影响RandomForestClassifier不平衡数据集中的结果

发布于04月26日

我对ML相当陌生，现在我正在预测中型数据集中的员工流失.我已经能够顺利运行一切，但是，由于数据集不平衡，我一直在try 向模型添加权重，因此通过失go 一些精度，我在正类中获得更多召回.当我try 在scikit-learn RandomForestClassifier中这样做时，问题来了，我try 了不同的方法，通过为值创建独立的dict，将dict直接添加到参数中，但它根本不会影响模型.结果总是保持不变，多数阶级的成绩总是比少数阶级好.

对于其他型号，我完全没有问题.

我这里是不是做错了什么？

(这是我正在使用的数据集，如果它对任何人有帮助的话:https://www.kaggle.com/datasets/bhanupratapbiswas/hr-analytics-case-study)

模型代码:

#Running the model with the best hyperparameters
weight_dict = {0: 0.59, 1: 3.12}

model = RandomForestClassifier(bootstrap=False, criterion='gini', max_depth=24, max_features='log2', min_samples_leaf=1, min_samples_split=2, n_estimators=200, class_weight=weight_dict)
model.fit(X_train_smote, y_train_smote)
y_pred = model.predict(X_test_outliers)

#Printing the results
print('Accuracy:', accuracy_score(y_test, y_pred))
print('AUC-ROC Score:', roc_auc_score(y_test, y_pred))
print('Classification Report:', classification_report(y_test, y_pred))

#Plotting the confusion matrix
plt.figure()
sns.heatmap(confusion_matrix(y_test, y_pred), annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.xticks(rotation=45)

我预计少数族裔阶层会有更多的回忆，而多数族裔阶层会失go 一些记忆和回忆.

我已经判断了过go 的问题和答案，但我已经应用了不同答案的解决方案，但没有成功.

谢谢！

from sklearn.utils.class_weight import compute_class_weight weights = compute_class_weight('balanced', classes=np.unique(y_train), y=y_train) weight_dict = dict(zip(np.unique(y_train), weights)) clf = RandomForestClassifier(class_weight=weight_dict)

from sklearn.metrics import precision_recall_curve, auc precision, recall, _ = precision_recall_curve(y_test, model.predict_proba(X_test)[:, 1]) auc_pr = auc(recall, precision) print('AUC-PR Score:', auc_pr)

Python Class_weight参数不影响RandomForestClassifier不平衡数据集中的结果

推荐答案

Python相关问答推荐

如何使用关键参数按列对Pandas rame进行排序

按 struct 值对Polars列表[struct[]]排序

基本链合同的地址是如何计算的？

如何根据另一列值用字典中的值替换列值

如何才能知道Python中2列表中的巧合.顺序很重要，但当1个失败时，其余的不应该失败或是0巧合

Pystata：从Python并行运行stata实例

Python库：可选地支持numpy类型，而不依赖于numpy

如何获取TFIDF Transformer中的值？

在Python中动态计算范围

如何在turtle中不使用write()来绘制填充字母(例如OEG)

无论输入分辨率如何，稳定扩散管道始终输出512 * 512张图像

python panda ExcelWriter切换动态公式到数组公式

pysnmp—lextudio使用next()和getCmd()生成器导致TypeError：tuple对象不是迭代器''

如何从pandas DataFrame中获取. groupby()和. agg()之后的子列？

不允许 Select 北极滚动？

如何删除重复的文字翻拍？

判断Python操作：如何从字面上得到所有decorator ？

GPT python SDK引入了大量开销/错误超时

提取数组每行的非零元素

删除特定列后的所有列