我正在try 创建一个新的数据库,其中包含按组(三个分组变量)的相关性.目前的问题是,结果总是包含一个相同的值(而不是不同的组值).

我们try 了以下代码:


library(dplyr)
calc_cor <- function(df) {
  list(
    Cor1x2 = cor(df$var1, df$var2, method = "spearman", use = "pairwise.complete.obs"),
    Cor1x3 = cor(df$var1, df$var3, method = "spearman", use = "pairwise.complete.obs"),
    Cor1x4 = cor(df$var1, df$var4, method = "spearman", use = "pairwise.complete.obs"),
    Cor1x5 = cor(df$var1, df$var5, method = "spearman", use = "pairwise.complete.obs"),
    Cor2x3 = cor(df$var2, df$var3, method = "spearman", use = "pairwise.complete.obs"),
    Cor2x4 = cor(df$var2, df$var4, method = "spearman", use = "pairwise.complete.obs"),
    Cor2x5 = cor(df$var2, df$var5, method = "spearman", use = "pairwise.complete.obs"),
    Cor3x4 = cor(df$var3, df$var4, method = "spearman", use = "pairwise.complete.obs"),
    Cor3x5 = cor(df$var3, df$var5, method = "spearman", use = "pairwise.complete.obs"),
    Cor4x5 = cor(df$var4, df$var5, method = "spearman", use = "pairwise.complete.obs")
  )
}

# „group“ variable contains values from 1 to 55
# „meeting“ variable contains values from 1 to 6
# „session_phase“ variable contains values from 1 to 2.

result <- df %>%
  group_by(group, meeting, session_phase) %>%
  summarise(correlations = list(calc_cor(.))) %>%
  unnest_wider(correlations) %>%
  as.data.frame()

每一列,特别是var4和var5,包含多个NA,它们应该被忽略,以便框架包含可以在最后使用的值的相关性.

输入数据数据

group   meeting session_phase   min  var1   var2    var3    var4    var5
1       1       1               0    0,3    0,19    0,26        
1       1       1               1    0,46   0,28    0,15        
1       1       1               2    0,42   0,39    0,22        
2       1       1               0    0,65   0,52    1,26        
2       1       1               1    0,94   0,36    1,22        
2       1       1               2    0,64   0,43    1,31        
2       1       1               3    0,55   0,32    1       0,95
…                               
1       1       2               0    0,55           0,82    0,79    0,95
1       1       2               1    1,02           1,02    1,09    0,7
1       1       2               2    0,69           0,71    0,95    0,54
2       1       2               0    0,59           0,31    0,7     0,37
2       1       2               1    0,34           0,2     0,54    0,59
2       1       2               2    0              0,55    0,2     0,37    
…                               
55      6       2               0    0,81           0       1,2     0,58
55      6       2               1    0,18           1,2     0,58    
55      6       2               2    0,27           1,14    0,39    

var1到var5现在应该彼此相关(var1与var2、var1与var3、var1与var4等),
通过分组变量group、meeting和session—phase,所以我们最终得到了一个从var1到var2的相关性,用于组1、meeting 1、session 1,等等,从var1到var2的相关性.

虚假输出

group   meeting session_phase   Cor1x2      Cor1x3      Cor1x4      Cor1x5      Cor2x3      Cor2x4      Cor2x5      Cor3x4      Cor3x5      Cor4x5
1       1       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       1       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       2       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       2       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       3       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       3       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       4       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       4       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       5       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       5       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       6       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
1       6       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
2       1       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
2       1       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
2       2       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
2       2       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
2       3       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
2       3       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
…                                               
55      6       1               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882
55      6       2               0,333523752 0,32578531  0,399857124 0,494264624 0,331031329 0,315864637 0,697428913 0,407285621 0,834623448 0,622817882

Cor1x2—Cor4x5的预期输出值应为每行和列不同.我想我忘记了一个步骤,以便把所有的东西都正确地联系起来.我该怎么解决这个问题?

推荐答案

在函数中输入cur_data——而且,corrr::correlate()会将重复调用的次数减少到cor().需要一些处理来获得您想要的格式,但类似这样的方法可以工作:

library(tidyverse)
library(corrr)

flatten_corr <- function(grp) {
  df <- correlate(cur_data(), quiet = TRUE) |> 
    shave() |> # take lower diagonal
    pivot_longer(-term) |> 
    filter(!is.na(value)) |> 
    unite(corr_pair, name, term, sep = "_x_") |> 
    t() |> 
    as_tibble() 
  
  df |> 
    set_names(df[1, ]) |> # use corr_pair for colnames
    slice_tail(n = 1) |> # drop corr_pair from the tibble
    mutate(across(everything(), as.numeric)) # t() converts to str, revert to numeric
}

data |> 
  group_by(group, meeting, session_phase) |> 
  summarise(correlations = flatten_corr(cur_data())) |> 
  unnest_wider(correlations)

输出:

# A tibble: 150 × 6
# Groups:   group, meeting [50]
   group meeting session_phase var1_x_var2 var1_x_var3 var2_x_var3
   <chr>   <int>         <int>       <dbl>       <dbl>       <dbl>
 1 a           1             1      0.333       -0.145    -0.00481
 2 a           1             2     -0.822        0.958    -0.674  
 3 a           1             3     -0.0976       0.434    -0.138  
 4 a           2             1      0.878       -0.139     0.0916 
 5 a           2             2      0.698        0.185     0.0922 
 6 a           2             3     -0.284       -0.259    -0.214  
 7 a           3             1     -0.708       -0.402     0.0499 
 8 a           3             2     -0.233        0.144     0.713  
 9 a           3             3     -0.538        0.606    -0.475  
10 a           4             1      0.747       -0.550    -0.433  
# … with 140 more rows

示例数据集:

set.seed(1L)
n <- 1000
data <- tibble(
  group = sample(letters[1:5], size = n, replace = TRUE),
  meeting = sample(1:10, size = n, replace = TRUE),
  session_phase = sample(1:3, size = n, replace = TRUE),
  var1 = rnorm(n = n),
  var2 = rnorm(n = n),
  var3 = rnorm(n = n)
)

R相关问答推荐

有没有方法将paste 0功能与列表结合起来?

使用lapply的重新定位功能

过滤矩阵以获得R中的唯一组合

如何删除R中除某些特定名称外的所有字符串?

有没有一个R函数允许你从一个数字变量中提取一个数字,而不考虑它的位置(不仅仅是第一个或最后一个数字?

根据元素和前一个值之间的差值过滤矩阵的元素

以相同的方式对每个表进行排序

使用`Watch()`和`renderUI()`时,不再满足仍出现在SHILINY AFTER条件中的条件输入

R:用GGPLATE,如何在两个独立的变量中制作不同形状的散点图?

具有重复元素的维恩图

将多个变量组合成宽格式

我将工作代码重构为一个函数--现在我想不出如何传递轴列参数

R中时间间隔的大向量与参考时间间隔的相交

计算多变量的加权和

如果满足条件,则替换列的前一个值和后续值

使用LAG和dplyr执行计算,以便按行和按组迭代

基于R中的引用将向量值替换为数据框列的值

动态统计函数在ShinyApp内部更改

R:改进实现简单模型

为什么在POSIXct-times的向量上循环会改变R中的类型?