我在R中有一个有5列的数据框.第一列是ID被拆分的两个组.接下来的4列是对问卷的回答,其中2列是连续变量,另外两列是分类变量.
library(dplyr)
# set random seed for reproducibility
set.seed(123)
# create first column with categorical variable with levels yes and no
col1 <- sample(c("yes", "no"), 20, replace = TRUE)
# create second and third columns with continuous variables
col2 <- rnorm(20, mean = 10, sd = 2)
col3 <- rnorm(20, mean = 5, sd = 1)
# create fourth and fifth columns with categorical variables
col4 <- sample(c("high", "low"), 20, replace = TRUE)
col5 <- sample(c("small", "big"), 20, replace = TRUE)
# combine all columns into a data frame
df <- tibble(col1, col2, col3, col4, col5)
# print the data frame
df
我想进行两个检验统计.如果变量(列)是连续的,则执行Mann Whitney U检验,如果变量是分类的,则创建列联表并执行卡方独立性检验.
例如,对于连续 case :
# perform Mann-Whitney U test on column 2, according to the two groups in column 1
test_result <- wilcox.test(df$col2 ~ df$col1)
# print the test result
print(test_result)
对于绝对的情况:
# create contingency table for columns 1 and 5
cont_table <- table(df[, 1], df[, 5])
# print the contingency table
print(cont_table)
# perform chi-squared test on the first and fifth columns
test_result2 <- chisq.test(cont_table)
但我希望这一列都在管道下执行,以判断列是绝对的还是连续的(数值),并执行我想要的相应测试.我想用dplyr包来做这件事,并且只总结p值.
我怎样才能在R中做到这一点?