我想要提取在多个学习者对一个任务的基准中获得的结果的标准差和/或IC95,以确保结果是完整的.我读到了这个:mlr3 standard deviation for k-fold cross-validation resampling
但我不知道二月份以来有没有什么新的东西.
以下是我的代码和BMR的情节:
resampling_outer = rsmp("cv", folds = 5)
resampling_inner = rsmp("cv", folds = 3)
set.seed(372)
resampling_outer$instantiate(task_wilcox)
resampling_inner$instantiate(task_wilcox)
at_xgboost = auto_tuner(tuner=tnr("mbo"), learner = xgboost,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_ranger = auto_tuner(tuner=tnr("mbo"), learner = ranger,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_svm = auto_tuner(tuner=tnr("mbo"), learner = svm,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
at_knn = auto_tuner(tuner=tnr("mbo"), learner = knn,resampling = resampling_inner, measure = msr("classif.auc"),term_evals = 20,store_tuning_instance = TRUE,store_models = TRUE)
learners <- c(at_xgboost, at_svm, at_ranger, at_knn)
measures = msrs(c("classif.auc", "classif.bacc", "classif.bbrier"))
#Benchmarking
set.seed(372)
design = benchmark_grid(tasks = task_wilcox, learners = learners, resamplings = resampling_outer)
bmr = benchmark(design, store_models = TRUE)
results <- bmr$aggregate(measures)
print(results)
autoplot(bmr, measure = msr("classif.auc"))
autoplot(bmr, type = "roc")
results
nr task_id learner_id resampling_id iters classif.auc classif.bacc classif.bbrier
1: 1 data_wilcox scale.xgboost.tuned cv 5 0.6112939 0.5767294 0.2326787
2: 2 data_wilcox scale.svm.tuned cv 5 0.5226407 0.5010260 0.1893202
3: 3 data_wilcox scale.random_forest.tuned cv 5 0.6200084 0.5614843 0.2229120
4: 4 data_wilcox scale.knn.tuned cv 5 0.5731675 0.5002955 0.1917721
extract_inner_tuning_results(bmr)[,list(learner_id, classif.auc)]
learner_id classif.auc
1: scale.xgboost.tuned 0.6231350
2: scale.xgboost.tuned 0.6207103
3: scale.xgboost.tuned 0.6175323
4: scale.xgboost.tuned 0.6195693
5: scale.xgboost.tuned 0.6222398
6: scale.svm.tuned 0.5891432
7: scale.svm.tuned 0.5837583
8: scale.svm.tuned 0.5767444
9: scale.svm.tuned 0.6027165
10: scale.svm.tuned 0.6082825
11: scale.random_forest.tuned 0.6287649
12: scale.random_forest.tuned 0.6165179
13: scale.random_forest.tuned 0.6288599
14: scale.random_forest.tuned 0.6259322
15: scale.random_forest.tuned 0.6234295
16: scale.knn.tuned 0.5931790
17: scale.knn.tuned 0.5926835
18: scale.knn.tuned 0.5931790
19: scale.knn.tuned 0.5929156
20: scale.knn.tuned 0.5929156
正如你在ROC曲线上看到的,有标准差或IC与有色和半透明的边缘,但如何提取它?我想,对于标准差,我必须提取嵌套CV的所有外部重采样的结果,但没有直接提取它的方法(它通过边距出现在ROC曲线上,我认为它存在于某个地方...).
第二个问题,关于我绘制的每个学习者AUC的框图,我真的不知道它是如何构建框图的,因为它与测试集上的结果不一致(外层循环=重采样外层)……
最后一个问题:你知道如何个性化mlr3中的roc曲线吗?如果我想在方案上添加AUC,或删除曲线周围的页边距,例如...
谢啦!