library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
           class = c("sports", "", "music, sports", ""),
           hobby = c("knitting, music, sports", "", "", "music"))
> dat1
  id         class                   hobby
1  1        sports knitting, music, sports
2  2                                      
3 34 music, sports                        
4 99                                 music

我有上面的数据集dat1,其中每一行对应于唯一的id.对于每个id,classhobby的多个输入用逗号分隔.

我想交换此数据集的行和列,以便获得以下内容:

     input class hobby
1   sports 1, 34     1
2 knitting           1
3    music    34 1, 99

在此数据集中,每行对应于dat1中唯一的input.现在,classhobby列存储了dat1中对应的id,每一列都用逗号分隔.

在R中有像这样交换行和列的快速方法吗?

推荐答案

以下是data.table%的解决方案

Input

library(data.table)
dat1 <- data.table(id = c(1, 2, 34, 99),
                   class = c("sports", "", "music, sports", ""),
                   hobby = c("knitting, music, sports", "", "", "music"))
dat1
#>    id         class                   hobby
#> 1:  1        sports knitting, music, sports
#> 2:  2                                      
#> 3: 34 music, sports                        
#> 4: 99                                 music

Dataprep

# in long format
dt_melted <- melt.data.table(dat1, id.vars = "id", variable.name = "type", value.name = "value")
dt_melted
#>    id  type                   value
#> 1:  1 class                  sports
#> 2:  2 class                        
#> 3: 34 class           music, sports
#> 4: 99 class                        
#> 5:  1 hobby knitting, music, sports
#> 6:  2 hobby                        
#> 7: 34 hobby                        
#> 8: 99 hobby                   music

# split values by comma
dt_splitted <- dt_melted[, .(input = unlist(data.table::tstrsplit(value, ","))), by = .(id, type)]
dt_splitted
#>    id  type    input
#> 1:  1 class   sports
#> 2: 34 class    music
#> 3: 34 class   sports
#> 4:  1 hobby knitting
#> 5:  1 hobby    music
#> 6:  1 hobby   sports
#> 7: 99 hobby    music

Last Step 1

# bring back to desired wide format
dt_casted <- dcast.data.table(dt_splitted, 
                              formula = "input ~ type",
                              value.var = "id",
                              fun.aggregate = paste, 
                              collapse = ", ")
dt_casted
#>       input class hobby
#> 1: knitting           1
#> 2:    music    34 1, 99
#> 3:   sports 1, 34     1

Last Step 2 (more verbose)

# combine ids by class/hobby
dt_splitted[, .(class = paste(id[type == "class"], collapse = ", "),
                hobby = paste(id[type == "hobby"], collapse = ", ")),
            by = .(input = trimws(input))]
#>       input class hobby
#> 1:   sports 1, 34     1
#> 2:    music    34 1, 99
#> 3: knitting           1

R相关问答推荐

geom_raster不适用于x比例中超过2,15的值

将带有范围的字符串转换为R中的数字载体

多个ggpredicate对象的平均值

在值和NA的行顺序中寻找中断模式

R创建一个数据透视表,计算多个组的百分比

如何求解arg必须为NULL或deSolve包的ode函数中的字符向量错误

自动变更列表

打印XTS对象

使用带有OR条件的grepl过滤字符串

如何在R库GoogleDrive中完全删除预先授权的Google帐户?

解析嵌套程度极高的地理数据

如何在R中使用混合GAM模型只对固定的影响因素进行适当的预测?

使用来自嵌套列和非嵌套列的输入的PURRR:MAP和dplyr::Mariate

有没有办法将基于每个值中出现的两个关键字或短语的字符串向量重新编码为具有这两个值的新向量?

数据集上的R循环和存储模型系数

按镜像列值自定义行顺序

禁用时,SelecizeInput将变得不透明

识别部分重复行,其中一行为NA,其重复行为非NA

如何在一种 colored颜色 中设置数值变量的 colored颜色 和高于阈值的 colored颜色 点?

根据向量对列表元素进行排序