通过使用 dplyrtidyverse 对变量进行分组来计算项目的内部一致性

发布于07月04日

I’d like to calculate the internal consistency (alpha and omega) of items by grouping variables (e.g., age and raterType). Ideally I’d be able to do this using dplyr/tidyverse. My question is similar to another question (Using dplyr to nest or group two variables, then perform the Cronbach's alpha function or other statistics to the data), however I can’t get the solution to work in my case.

下面是一个简单的示例:

library("tidyverse")
library("psych")
library("MBESS")

mydata <- expand.grid(ID = 1:100,
                      age = 1:5,
                      raterType = c("self",
                                    "friend",
                                    "parent"))

set.seed(12345)

mydata$item1 <- sample(1:7, nrow(mydata), replace = TRUE)
mydata$item2 <- sample(1:7, nrow(mydata), replace = TRUE)
mydata$item3 <- sample(1:7, nrow(mydata), replace = TRUE)
mydata$item4 <- sample(1:7, nrow(mydata), replace = TRUE)
mydata$item5 <- sample(1:7, nrow(mydata), replace = TRUE)
mydata$item6 <- sample(1:7, nrow(mydata), replace = TRUE)

mydata$item1[sample(nrow(mydata), 100)] <- NA
mydata$item2[sample(nrow(mydata), 100)] <- NA
mydata$item3[sample(nrow(mydata), 100)] <- NA
mydata$item4[sample(nrow(mydata), 100)] <- NA
mydata$item5[sample(nrow(mydata), 100)] <- NA
mydata$item6[sample(nrow(mydata), 100)] <- NA

itemNames <- paste("item", 1:6, sep = "")

为了计算整个数据集的内部一致性，我将通过以下代码分别计算alpha和omega:

alpha(mydata[,itemNames])$total$raw_alpha
ci.reliability(mydata[,itemNames], type = "omega", interval.type = "none")$est

然而，我想计算age和raterType的每个组合的α和ω.

以下是我的try :

mydata %>%
  pivot_longer(cols = c(-age, -raterType, -ID)) %>%
  select(-ID) %>% 
  nest_by(age, raterType) %>%
  mutate(alpha = alpha(data)$total$raw_alpha,
         omega = ci.reliability(data, type = "omega", interval.type = "none")$est)

这会引发一个错误.出于某种原因，该代码提供了对ω的错误估计，并抛出了一个alpha错误:

> # This provides the wrong estimates:
> mydata %>%
+     pivot_longer(cols = c(-age, -raterType, -ID)) %>%
+     select(-ID) %>% 
+     nest_by(age, raterType) %>%
+     mutate(omega = ci.reliability(data, type = "omega", interval.type = "none")$est)
# A tibble: 15 × 4
# Rowwise:  age, raterType
     age raterType               data omega
   <int> <fct>     <list<tibble[,2]>> <dbl>
 1     1 self               [600 × 2] 0.218
 2     1 friend             [600 × 2] 0.257
 3     1 parent             [600 × 2] 0.261
 4     2 self               [600 × 2] 0.196
 5     2 friend             [600 × 2] 0.257
 6     2 parent             [600 × 2] 0.209
 7     3 self               [600 × 2] 0.179
 8     3 friend             [600 × 2] 0.225
 9     3 parent             [600 × 2] 0.247
10     4 self               [600 × 2] 0.224
11     4 friend             [600 × 2] 0.252
12     4 parent             [600 × 2] 0.218
13     5 self               [600 × 2] 0.248
14     5 friend             [600 × 2] 0.218
15     5 parent             [600 × 2] 0.202
> 
> # This throws an error:
> mydata %>%
+     pivot_longer(cols = c(-age, -raterType, -ID)) %>%
+     select(-ID) %>% 
+     nest_by(age, raterType) %>%
+     mutate(alpha = alpha(data)$total$raw_alpha)
Number of categories should be increased  in order to count frequencies. 
Error in `mutate()`:
! Problem while computing `alpha = alpha(data)$total$raw_alpha`.
ℹ The error occurred in row 1.
Caused by error in `FUN()`:
! only defined on a data frame with all numeric-alike variables
Run `rlang::last_error()` to see where the error occurred.
Warning messages:
1: Problem while computing `alpha = alpha(data)$total$raw_alpha`.
ℹ NAs introduced by coercion
ℹ The warning occurred in row 1. 
2: Problem while computing `alpha = alpha(data)$total$raw_alpha`.
ℹ Item = name had no variance and was deleted but still is counted in the score
ℹ The warning occurred in row 1.

上述ω值与在相应数据子集上运行ci.reliability()函数获得的值不对应:

> alpha(mydata[which(mydata$age == 3 & mydata$raterType == "self"), itemNames])$total$raw_alpha
[1] -0.3018416
> ci.reliability(mydata[which(mydata$age == 3 & mydata$raterType == "self"), itemNames], type = "omega", interval.type = "none")$est
[1] 0.00836356

通过使用 dplyrtidyverse 对变量进行分组来计算项目的内部一致性

推荐答案

R相关问答推荐

如何使用rmarkdown和kableExtra删除包含折叠行的表的第一列的名称

查找图下的面积

随机森林回归：下拉列重要性

格点中指数、双曲和反双曲模型曲线的正确绘制

用值序列对行进行子集化，并标识序列开始的列

如何从像glm这样的模型中提取系数表的相关性？

在R gggplot2中是否有一种方法将绘图轴转换成连续的 colored颜色尺度？

以相同的方式对每个表进行排序

计算两列中满足特定条件连续行之间的平均值

跨列查找多个时间报告

WRS2包中带有bwtrim的简单ANOVA抛出错误

在点图上绘制置信度或预测区间ggplot2

如何预测原始数据集并将值添加到原始数据集中

对R中的列表列执行ROW Mean操作

在ggplot2图表中通过端点连接点

如何在R中使用因子行求和？

组合名称具有模式的列表的元素

识别部分重复行，其中一行为NA，其重复行为非NA

如何在类应用函数中访问函数本身

如何 suppress 条形图中的零条？