Tidyverse 解决方案，用于对多列产品进行行式求和

发布于07月08日

Problem

我想找到一个优雅的tidyverse解决方案来创建每个n列的m乘积之和.我不想使用位置匹配，它应该是可推广的.

我摆弄了purrr::pmap_dbl(select(., ends_with(i)), prod)个，但没走多远.

Example for m = 3 and n = 2

library(tidyverse)

df <- tibble(
  x_0 = c(5,6),
  x_1 = c(9,1),
  x_2 = c(2,1),
  y_0 = c(3,2),
  y_1 = c(3,2),
  y_2 = c(1,3)
)
df
> df
# A tibble: 2 × 6
# x_0   x_1   x_2   y_0   y_1   y_2
#<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#   5     9     2     3     3     1
#   6     1     1     2     2     3

I want to calculate the sum of the products rowise:
sum_of_products = x_0 * y_0 + x_1 * y_1 + x_2 + y_2

第一排:5*3+9*3+2*2 = 46；第二排:6*2+1*2+1*3 = 17

Desired output

df_with_sum_of_products
# x_0   x_1   x_2   y_0   y_1   y_2  sum_of_products
#<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>           <dbl>
#   5     9     2     3     3     1               46
#   6     1     1     2     2     3               17

推荐答案

为了获得一个完全通用的robust解决方案，我认为最好将数据帧转换为更适合手头任务的内容.

df %>% 
  mutate(row=row_number()) %>% 
  pivot_longer(
    -row, 
    names_sep="_", 
    names_to=c("name", "index")
  ) %>%  
  group_by(row, index) %>% 
  pivot_wider(names_from=name, values_from=value)
# A tibble: 6 x 4
# Groups:   row, index [6]
    row index     x     y
  <int> <chr> <dbl> <dbl>
1     1 0         5     3
2     1 1         9     3
3     1 2         2     1
4     2 0         6     2
5     2 1         1     2
6     2 2         1     3

然后计算乘积之和...

df %>% 
  mutate(row=row_number()) %>% 
  pivot_longer(
    -row, 
    names_sep="_", 
    names_to=c("name", "index")
  ) %>%  
  group_by(row, index) %>% 
  pivot_wider(names_from=name, values_from=value) %>% 
  mutate(product=x * y) %>% 
  group_by(row) %>% 
  summarise(sum_product=sum(product))
# A tibble: 2 x 2
    row sum_product
  <int>       <dbl>
1     1          44
2     2          17

这对行数、变量类型数(例如x、y和z)和索引数(例如1、2和3)具有鲁棒性.

Edit

我认为上述解决方案对于变量类型的数量是鲁棒的，这是错误的.(因为管道中的水位读数为mutate(product=x * y).)这是一个解决方案，与一个修改后的输入数据集一起证明它是正确的.

df1 <- tibble(
  x_0 = c(5,6,1,-1), x_1 = c(9,1,1,3), x_2 = c(2,1,3,4),
  y_0 = c(3,2,1, 2), y_1 = c(3,2,2,2), y_2 = c(1,3,2,2),
  z_0 = c(4,5,1, 3), z_1 = c(3,1,2,1), z_2 = c(2,2,1,3)

)

df1 %>% 
  mutate(row=row_number()) %>% 
  pivot_longer(
    -row, 
    names_sep="_", 
    names_to=c("name", "index")
  ) %>%  
  group_by(row, index) %>% 
  pivot_wider(names_from=name, values_from=value) %>% 
  group_map(
    function(.x, .y, .keep=TRUE) {
      .y %>% bind_cols(.x %>% mutate(product = unlist(apply(.x, 1, prod))))
    }
  ) %>% bind_rows() %>% 
  group_by(row) %>% 
  summarise(sum_product=sum(product))
# A tibble: 4 x 2
    row sum_product
  <int>       <dbl>
1     1         145
2     2          68
3     3          11
4     4          24