The goal

我想对Dirichlet回归进行优势分析,以近似一组预测值(比例连续预测值、带样条的连续预测值和因子)的相对重要性.Dirichlet回归是Beta回归的扩展,用于不是从计数得出的模型比例,并且分为两个以上的类别,见Douma&Amp;Weedon(2019).

The modelling approach: the syntax is potentially important

我正在使用DirichletReg软件包来拟合带有"alternative"参数的Dirichlet回归:这允许同时估计参数和估计的精度.语法为:response ~ parameters | precision.参数的估计可以使用与用于估计精度的预测因子不同的预测因子进行:response ~ predictor1 + predictor2 | predictor3.如果未声明,模型假定固定精度:response ~ predictors,可以明确声明为:response ~ predictors | 1.

I think that the error is related to the vertical bar in the formula, which separates the predictors used to estimate parameters from the predictors used to estimate precision.

我依靠performance::r2()来计算模型质量的一项指标:Nagelkerke的伪R2.然而,对于实际分析,我认为要么是麦克法登,要么是埃斯特雷拉的伪R2,因为它们似乎适合对多项式响应进行优势分析,参见Luchman 2014.

The obstacle

我收到错误消息:"fitstat requires at least two elements".

A reproducible example

来自DirichletReg包中的可用数据.响应只有两个类别,但无论如何,它都会产生与实际分析中相同的错误消息.

library(DirichletReg)
#> Warning: package 'DirichletReg' was built under R version 4.1.3
#> Loading required package: Formula
#> Warning: package 'Formula' was built under R version 4.1.1
library(domir)
library(performance)
#> Warning: package 'performance' was built under R version 4.1.3

# Assemble data
RS <- ReadingSkills
RS$acc <- DR_data(RS$accuracy)
#> only one variable in [0, 1] supplied - beta-distribution assumed.
#> check this assumption.
RS$dyslexia <- C(RS$dyslexia, treatment)

# Fit Dirichlet regression
rs2 <- DirichReg(acc ~ dyslexia + iq | dyslexia + iq, data = RS, model = "alternative")

summary(rs2)
#> Call:
#> DirichReg(formula = acc ~ dyslexia + iq | dyslexia + iq, data = RS, model =
#> "alternative")
#> 
#> Standardized Residuals:
#>                   Min       1Q  Median      3Q     Max
#> 1 - accuracy  -1.5279  -0.7798  -0.343  0.6992  2.4213
#> accuracy      -2.4213  -0.6992   0.343  0.7798  1.5279
#> 
#> MEAN MODELS:
#> ------------------------------------------------------------------
#> Coefficients for variable no. 1: 1 - accuracy
#> - variable omitted (reference category) -
#> ------------------------------------------------------------------
#> Coefficients for variable no. 2: accuracy
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  2.22386    0.28087   7.918 2.42e-15 ***
#> dyslexiayes -1.81261    0.29696  -6.104 1.04e-09 ***
#> iq          -0.02676    0.06900  -0.388    0.698    
#> ------------------------------------------------------------------
#> 
#> PRECISION MODEL:
#> ------------------------------------------------------------------
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  1.71017    0.32697   5.230 1.69e-07 ***
#> dyslexiayes  2.47521    0.55055   4.496 6.93e-06 ***
#> iq           0.04097    0.27537   0.149    0.882    
#> ------------------------------------------------------------------
#> Significance codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Log-likelihood: 61.26 on 6 df (33 BFGS + 1 NR Iterations)
#> AIC: -110.5, BIC: -99.81
#> Number of Observations: 44
#> Links: Logit (Means) and Log (Precision)
#> Parametrization: alternative
as.numeric(performance::r2(rs2))
#> [1] 0.4590758

# Run dominance analysis: error

# If left undeclared, the model assumes fixed precision: parameters |  1
domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
)
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | 1,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
             )
#> Error in domir::domin(acc ~ dyslexia + iq | 1, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq | dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
             )
#> Error in domir::domin(acc ~ dyslexia + iq | dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

domir::domin(acc ~ dyslexia + iq,
             reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
             fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")),
             consmodel = "| dyslexia + iq"
             )
#> Error in domir::domin(acc ~ dyslexia + iq, reg = function(y) DirichletReg::DirichReg(y, : fitstat requires at least two elements.

sessionInfo()
#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252   
#> [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C                  
#> [5] LC_TIME=Spanish_Spain.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] performance_0.10.0 domir_1.0.1        DirichletReg_0.7-1 Formula_1.2-4     
#> 
#> loaded via a namespace (and not attached):
#>  [1] rstudioapi_0.13  knitr_1.38       magrittr_2.0.3   insight_0.19.1  
#>  [5] lattice_0.20-44  rlang_1.1.0      fastmap_1.1.0    stringr_1.5.0   
#>  [9] highr_0.9        tools_4.1.0      grid_4.1.0       xfun_0.30       
#> [13] cli_3.6.0        withr_2.5.0      htmltools_0.5.2  maxLik_1.5-2    
#> [17] miscTools_0.6-28 yaml_2.3.5       digest_0.6.29    lifecycle_1.0.3 
#> [21] vctrs_0.6.1      fs_1.5.2         glue_1.6.2       evaluate_0.15   
#> [25] rmarkdown_2.13   sandwich_3.0-1   reprex_2.0.1     stringi_1.7.6   
#> [29] compiler_4.1.0   generics_0.1.2   zoo_1.8-9

reprex package(v2.0.1)于2023-07-27创建

References

Luchman Relative Importance Analysis With Multicategory Dependent Variables:: An Extension and Review of Best Practices (2014) Organizational research methods

Douma & Weedon. Analysing continuous proportions in ecology and evolution: A practical introduction to beta and Dirichlet regression (2019) Methods in Ecology and Evolution

EDIT (31 July 2023)

非常感谢约瑟夫·卢赫曼的解决方案!我修改了函数以避免使用管道,并在paste0()调用中添加了精确度公式.不幸的是,reprex::reprex()不会呈现结果,所以我在下面粘贴了一个屏幕截图.

domir::domir(acc ~ dyslexia + iq, function(y)  {
  iv <- attr(terms(y), "term.labels")
  fml <- paste0("acc ~ ", paste0(iv, collapse = "+"), "| dyslexia + iq", collapse = "")
  print(fml)
  performance::r2( DirichReg(as.formula(fml), data = RS, model = "alternative") )[[1]]})

screenshot result dirichlet dominance analysis

推荐答案

这里询问的问题被domin暗示,因为提交到fitstatlist是长度1.

> list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke"))
[[1]]
\(x) list(r2.nagelkerke = as.numeric(performance::r2(x)), "r2.nagelkerke")

移动括号可以修复它,但揭示了另一个,我相信,与DirichletReg::DirichReg上的设计有关.

> domir::domin(acc ~ dyslexia + iq,
+              reg =  function(y)  DirichletReg::DirichReg(y, data = RS, model = "alternative"),
+              fitstat = list(\(x) list(r2.nagelkerke = as.numeric(performance::r2(x))), "r2.nagelkerke")
+ )
Error in x$formula : object of type 'symbol' is not subsettable

基本上,DirichletReg::DirichReg似乎不能接受使用domin所需的懒惰判断的formula.

例如,大多数带有formula的建模函数允许执行以下操作:

> lapply(list(mpg ~ am, mpg ~ vs), lm, data = datasets::mtcars)
[[1]]

Call:
FUN(formula = X[[i]], data = ..1)

Coefficients:
(Intercept)           am  
     17.147        7.245  


[[2]]

Call:
FUN(formula = X[[i]], data = ..1)

Coefficients:
(Intercept)           vs  
      16.62         7.94  

正如您在输出的Call部分中看到的,lm以一种灵活的方式接受参数,并在需要时计算公式,就像应用于数据一样.

当try 与DirichReg类似的操作时,使用焦点模型的部分结果是:

> lapply(list(acc ~ dyslexia, acc ~ iq), DirichReg, data = RS, model = "alternative")
Error in eval(x) : object 'X' not found

DirichReg实际上需要将公式‘视为’字符串(因为它使用match.call来解析参数以进行处理;至少我认为这就是问题所在).

这个问题的解决方案稍微复杂一些.我必须在运行中使用公式domin(或者在下面的例子中,我使用更新的domir::domir;另请注意,我使用的是Rv4.3以允许使用基本的R管道进行元素 Select )提交给每个函数调用以重新构造字符串公式,然后在下面的示例中提交给DirichReg时解释as.formula.生产的配方也被打印出来.

> domir(acc ~ dyslexia + iq, function(y)  {
+     iv <- terms(y) |> attr("term.labels")
+     fml <- paste0("acc ~ ", paste0(iv, collapse = "+"), collapse = "")
+     print(fml)
+     DirichReg(as.formula(fml), data = RS, model = "alternative") |> performance::r2() |> _[[1]]})
[1] "acc ~ dyslexia+iq"
[1] "acc ~ dyslexia"
[1] "acc ~ iq"
Overall Value:      0.6568343 

General Dominance Values:
         General Dominance Standardized Ranks
dyslexia         0.4983012    0.7586406     1
iq               0.1585332    0.2413594     2

Conditional Dominance Values:
         Subset Size: 1 Subset Size: 2
dyslexia      0.6498178    0.346784532
iq            0.3100498    0.007016514

Complete Dominance Designations:
                 Dmnated?dyslexia Dmnated?iq
Dmnates?dyslexia               NA       TRUE
Dmnates?iq                  FALSE         NA

R相关问答推荐

提取R中值和列名的所有可能组合

在R中查找每个组不同时间段的总天数

为什么观察不会被无功值变化触发?

然后根据不同的列值有条件地执行函数

如何根据R中其他列的值有条件地从列中提取数据?

在R中,如何将变量(A,B和C)拟合在同一列中,如A和B,以及A和C在同一面板中?

R中的哈密顿滤波

使用rvest从多个页面抓取时避免404错误

`lazy_dt`不支持`dplyr/across`?

TreeNode打印 twig 并为其上色

R如何计算现有行的总和以添加新的数据行

如何根据R中其他变量的类别汇总值?

在R中创建连续的期间

查找所有站点的最小值

计算使一组输入值最小化的a、b和c的值

在ggplot2图表中通过端点连接点

网络抓取NBA.com

重写时间间隔模糊连接以减少内存消耗

R dplyr::带有名称注入(LHS of:=)的函数,稍后在:=的RHS上引用

在使用ggplot2的情况下,如何在使用coord_trans函数的同时,根据未转换的坐标比来定位geom_瓷砖?