我想要提取在多个学习者对一个任务的基准中获得的结果的标准差和/或IC95,以确保结果是完整的.我读到了这个:mlr3 standard deviation for k-fold cross-validation resampling

但我不知道二月份以来有没有什么新的东西.

以下是我的代码和BMR的情节:

resampling_outer = rsmp("cv", folds = 5)
resampling_inner = rsmp("cv", folds = 3)

set.seed(372)
resampling_outer$instantiate(task_wilcox)
resampling_inner$instantiate(task_wilcox)

at_xgboost = auto_tuner(tuner=tnr("mbo"), learner = xgboost,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_ranger = auto_tuner(tuner=tnr("mbo"), learner = ranger,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_svm = auto_tuner(tuner=tnr("mbo"), learner = svm,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_knn = auto_tuner(tuner=tnr("mbo"), learner = knn,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)


learners <- c(at_xgboost, at_svm, at_ranger, at_knn)

measures = msrs(c("classif.auc", "classif.bacc", "classif.bbrier"))

#Benchmarking
set.seed(372)
design = benchmark_grid(tasks = task_wilcox, learners = learners, resamplings = resampling_outer)
bmr = benchmark(design, store_models = TRUE)
results <- bmr$aggregate(measures)
print(results)
autoplot(bmr, measure = msr("classif.auc"))
autoplot(bmr, type = "roc")

enter image description here

enter image description here

results
   nr     task_id                learner_id resampling_id iters classif.auc classif.bacc classif.bbrier
1:  1 data_wilcox       scale.xgboost.tuned            cv     5   0.6112939    0.5767294      0.2326787
2:  2 data_wilcox           scale.svm.tuned            cv     5   0.5226407    0.5010260      0.1893202
3:  3 data_wilcox scale.random_forest.tuned            cv     5   0.6200084    0.5614843      0.2229120
4:  4 data_wilcox           scale.knn.tuned            cv     5   0.5731675    0.5002955      0.1917721
extract_inner_tuning_results(bmr)[,list(learner_id, classif.auc)]
                   learner_id classif.auc
 1:       scale.xgboost.tuned   0.6231350
 2:       scale.xgboost.tuned   0.6207103
 3:       scale.xgboost.tuned   0.6175323
 4:       scale.xgboost.tuned   0.6195693
 5:       scale.xgboost.tuned   0.6222398
 6:           scale.svm.tuned   0.5891432
 7:           scale.svm.tuned   0.5837583
 8:           scale.svm.tuned   0.5767444
 9:           scale.svm.tuned   0.6027165
10:           scale.svm.tuned   0.6082825
11: scale.random_forest.tuned   0.6287649
12: scale.random_forest.tuned   0.6165179
13: scale.random_forest.tuned   0.6288599
14: scale.random_forest.tuned   0.6259322
15: scale.random_forest.tuned   0.6234295
16:           scale.knn.tuned   0.5931790
17:           scale.knn.tuned   0.5926835
18:           scale.knn.tuned   0.5931790
19:           scale.knn.tuned   0.5929156
20:           scale.knn.tuned   0.5929156

正如你在ROC曲线上看到的,有标准差或IC与有色和半透明的边缘,但如何提取它?我想,对于标准差,我必须提取嵌套CV的所有外部重采样的结果,但没有直接提取它的方法(它通过边距出现在ROC曲线上,我认为它存在于某个地方...).

第二个问题,关于我绘制的每个学习者AUC的框图,我真的不知道它是如何构建框图的,因为它与测试集上的结果不一致(外层循环=重采样外层)……

最后一个问题:你知道如何个性化mlr3中的roc曲线吗?如果我想在方案上添加AUC,或删除曲线周围的页边距,例如...

谢啦!

推荐答案

  1. 您可以通过设置predict_type = "se"来获得标准偏差和预测值,请参见https://mlr3book.mlr-org.com/chapters/chapter2/data_and_basic_modeling.html#sec-predicting.这是由学习者内部计算的,不是交叉验证的叠加,而是最有可能是您想要的.
  2. 作为结果,你得到的数字应该与情节相对应;如果没有一个完整的可重复的例子,我就不能告诉你更多了.
  3. mlr3中的所有绘图函数都返回ggplot2个对象;您可以使用与其他绘图相同的方式自定义它们.

R相关问答推荐

如何在热图中绘制一个图形,但在每个单元格中通过饼形图显示?

将一个载体的值相加,直到达到另一个载体的值

使用scale_x_continuous复制ggplot 2中的离散x轴

使用预定值列表将模拟数量(n)替换为rnorm()

计算R中的威布尔分布的EDF

向gggplot 2中的数据和轴标签添加大写和星号

将复杂的组合列表转换为数据框架

多重RHS固定估计

当两个图层映射到相同的美学时,隐藏一个图层的图例值

如何得到每四个元素向量R?

如何在R中描绘#符号?

将二进制数据库转换为频率表

ComplexHEAT:使用COLUMN_SPLIT时忽略COLUMN_ORDER

将标识符赋给事件序列,避免错误观察

在数据帧列表上绘制GGPUP

仅当后续值与特定值匹配时,才在列中回填Nas

是否有可能从边界中找到一个点值?

排序R矩阵的行和列

如何在刻面和翻转堆叠条形图中对齐geom_text()

列间序列生成器的功能