我正在使用caret包调优一个使用ranger的随机森林(RF)模型.因为在ranger包中我无法调整树的数量,所以我使用的是caret包.寻找最优树木数量的度量是R平方.我测试的树的范围是500到3000,步骤500(500,caret0,1500,...,3000).

The issue is that the R-squared is the same for every number of tree (see the attached image below): issue

我不认为这是正确的,所以我相信我的代码有问题.为什么我得到的是同样的R型平方?

以下是代码:

library(caret)
library(ranger)

# Load the data
block.data <- read.csv("path/block.data.csv")

eq1 = ntl ~ .

# Define the cross-validation method for hyperparameter tuning
control <- trainControl(method = "cv", number = 10, savePredictions = FALSE, 
                        search = 'grid', allowParallel = TRUE)

# default model
rf_default = train(eq1, 
                   data = block.data, 
                   method = "ranger", 
                   metric = "Rsquared", 
                   trControl = control)

print(rf_default)

# Define the grid of hyperparameters to be tuned
tuneGrid <- expand.grid(mtry = c(2, 3, 4, 5, 6, 7), # number of predictor variables to sample at each split
                        splitrule = c("variance", "extratrees"), # splitting rule
                        min.node.size = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)) # minimum size of terminal nodes
                       
# Train the model with hyperparameter tuning using caret
set.seed(234)
rf_model <- train(eq1, # formula for the response and predictors
                  data = block.data, 
                  method = "ranger", 
                  trControl = control, 
                  tuneGrid = tuneGrid) 

rf_model$bestTune

tuneGrid <- expand.grid(mtry = rf_model$bestTune$mtry,
                        splitrule = rf_model$bestTune$splitrule,
                        min.node.size = rf_model$bestTune$min.node.size)

store_maxtrees <- list()
for (ntree in c(500, 1000, 1500, 2000, 2500, 3000)) {
  set.seed(345)
  rf_maxtrees <- train(eq1,
                       data = block.data,
                       method = "ranger",
                       metric = "Rsquared",
                       tuneGrid = tuneGrid,
                       trControl = control,
                       ntree = ntree)
  key <- toString(ntree)
  store_maxtrees[[key]] <- rf_maxtrees
}
results_tree <- resamples(store_maxtrees)
summary(results_tree)

推荐答案

for循环中,我必须将ntree改为num.tree,如下所示:

for (num.tree in c(500, 1000, 1500, 2000, 2500, 3000)) {
set.seed(345)
rf_maxtrees <- caret::train(eq1,
data = block.data,
method = "ranger",
metric = "Rsquared",
tuneGrid = tuneGrid,
trControl = control,
num.tree = num.tree)
key <- toString(num.tree)
store_maxtrees[[key]] <- rf_maxtrees
}

R相关问答推荐

如何按行和列组合多个格式?

R -列表元素中所有命名项的总和

如何以编程方式将X轴勾号上的希腊符号合并到R图中?

如何写一个R函数来旋转最后n分钟?

如何在区分不同条件的同时可视化跨时间的连续变量?

在R中无法读入具有Readxl和lApply的数据集

条形图和在Ploly中悬停的问题

如果可能,将数字列转换为整数,否则保留为数字

根据现有列的名称和字符串的存在进行变异以创建多个新列

在ggplot2的框图中绘制所有级别的系数

R中的哈密顿滤波

使用范围和单个数字将数字与字符串进行比较

您是否可以将组添加到堆叠的柱状图

随机森林的带Shap值的蜂群图

在ggplot2上从多个数据框创建复杂的自定义图形

将统计检验添加到GGPUBR中的盒图,在R

是否有可能从边界中找到一个点值?

如何显示准确的p值而不是<;0.001*?

当由base::限定时,`[.factor`引发NextMethod错误

在shiny 表格中输入的文本在第一次后未更新