我有一个这样的数据帧:

wide_df <- data.frame(
  kingdom = c("Animalia", "Animalia", "Plantae", "Plantae"),
  phylum = c("Chordata", "Chordata", "Angiosperms", "Angiosperms"),
  class = c("Mammalia", "Mammalia", "Dicotyledons", "Dicotyledons"),
  order = c("Carnivora", "Carnivora", "Rosales", "Solanales"),
  family = c("Felidae", "Canidae", "Rosaceae", "Solanaceae"),
  count = c(2, 3, 1, 4)
)

> wide_df
   kingdom      phylum        class     order     family count
1 Animalia    Chordata     Mammalia Carnivora    Felidae    2
2 Animalia    Chordata     Mammalia Carnivora    Canidae    3
3  Plantae Angiosperms Dicotyledons   Rosales   Rosaceae    1
4  Plantae Angiosperms Dicotyledons Solanales Solanaceae    4

我想更改数据 struct ,使其看起来如下所示:

hierarchical_df <- data.frame(
  name = c("Animalia",
           "Animalia",
           "Animalia",
           "Animalia",
           "Animalia",
           "Chordata",
           "Chordata",
           "Chordata",
           "Chordata",
           "Chordata",
           "Mammalia",
           "Mammalia",
           "Mammalia",
           "Mammalia",
           "Mammalia",
           "Carnivora",
           "Carnivora",
           "Carnivora",
           "Carnivora",
           "Carnivora",
           "Felidae",
           "Felidae",
           "Canidae",
           "Canidae",
           "Canidae",
           "Plantae",
           "Plantae",
           "Plantae",
           "Plantae",
           "Plantae",
           "Angiosperms",
           "Angiosperms",
           "Angiosperms",
           "Angiosperms",
           "Angiosperms",
           "Dicotyledons",
           "Dicotyledons",
           "Dicotyledons",
           "Dicotyledons",
           "Dicotyledons",
           "Rosales",
           "Solanales",
           "Solanales",
           "Solanales",
           "Solanales",
           "Rosaceae",
           "Solanaceae",
           "Solanaceae",
           "Solanaceae",
           "Solanaceae"),
  parent = c(NA,
             NA,
             NA,
             NA,
             NA,
             "Animalia",
             "Animalia",
             "Animalia",
             "Animalia",
             "Animalia",
             "Chordata",
             "Chordata",
             "Chordata",
             "Chordata",
             "Chordata",
             "Mammalia",
             "Mammalia",
             "Mammalia",
             "Mammalia",
             "Mammalia",
             "Carnivora",
             "Carnivora",
             "Carnivora",
             "Carnivora",
             "Carnivora",
             NA,
             NA,
             NA,
             NA,
             NA,
             "Plantae",
             "Plantae",
             "Plantae",
             "Plantae",
             "Plantae",
             "Angiosperms",
             "Angiosperms",
             "Angiosperms",
             "Angiosperms",
             "Angiosperms",
             "Dicotyledons",
             "Dicotyledons",
             "Dicotyledons",
             "Dicotyledons",
             "Dicotyledons",
             "Rosales",
             "Solanales",
             "Solanales",
             "Solanales",
             "Solanales"))


hierarchical_df
           name       parent
1      Animalia         <NA>
2      Animalia         <NA>
3      Animalia         <NA>
4      Animalia         <NA>
5      Animalia         <NA>
6      Chordata     Animalia
7      Chordata     Animalia
8      Chordata     Animalia
9      Chordata     Animalia
10     Chordata     Animalia
11     Mammalia     Chordata
12     Mammalia     Chordata
13     Mammalia     Chordata
14     Mammalia     Chordata
15     Mammalia     Chordata
16    Carnivora     Mammalia
17    Carnivora     Mammalia
18    Carnivora     Mammalia
19    Carnivora     Mammalia
20    Carnivora     Mammalia
21      Felidae    Carnivora
22      Felidae    Carnivora
23      Canidae    Carnivora
24      Canidae    Carnivora
25      Canidae    Carnivora
26      Plantae         <NA>
27      Plantae         <NA>
28      Plantae         <NA>
29      Plantae         <NA>
30      Plantae         <NA>
31  Angiosperms      Plantae
32  Angiosperms      Plantae
33  Angiosperms      Plantae
34  Angiosperms      Plantae
35  Angiosperms      Plantae
36 Dicotyledons  Angiosperms
37 Dicotyledons  Angiosperms
38 Dicotyledons  Angiosperms
39 Dicotyledons  Angiosperms
40 Dicotyledons  Angiosperms
41      Rosales Dicotyledons
42    Solanales Dicotyledons
43    Solanales Dicotyledons
44    Solanales Dicotyledons
45    Solanales Dicotyledons
46     Rosaceae      Rosales
47   Solanaceae    Solanales
48   Solanaceae    Solanales
49   Solanaceae    Solanales
50   Solanaceae    Solanales

基本上,我正在try 将我的数据转换成一种形式,我可以使用它来使用这个包(https://github.com/fbreitwieser/hiervis)来制作Sankey图.我正试图将在特定地区看到的不同分类群的个体有机体的数量可视化.数据集中有40,000多个观测值.

推荐答案

Here is a way.
What you want is the original wide format df in just one column, then the 2nd column is this column lagged.

tmp <- wide_df[rep(row.names(wide_df), wide_df$count), ]
long_df <- stack(tmp[-6])
long_df$parent <- dplyr::lag(long_df$values, sum(long_df$ind == "family"))
rm(tmp)
names(long_df)[1L] <- "name"
long_df <- long_df[-2L]

这是发布的想要的结果的identical,但排序方式不同:

# check the result
i <- order(hierarchical_df$name)
j <- order(long_df$name)
tmp1 <- hierarchical_df[i, ]
tmp2 <- long_df[j, ]
row.names(tmp1) <- NULL
row.names(tmp2) <- NULL

identical(tmp1, tmp2)
#> [1] TRUE

R相关问答推荐

是什么导致R中的mvtnorm包中出现这个错误?

使用gsim删除特殊词

分组时间连续值

强制相关图以显示相关矩阵图中的尾随零

更改Heatmap Annotation对象的名称

从开始时间和结束时间导出时间

gt()从gt为相同内容的单元格 colored颜色 不同?

删除列表中存储的数据帧内和数据帧之间的重复行

如何改变时间图R中的悬停信息?

单个轮廓重叠条的单独图例

par函数中的缩写,比如mgp,mar,mai是如何被破译的?

多个过滤器内的一个盒子在仪表板Quarto

如何在所有绘图中保持条件值的 colored颜色 相同?

如何从像glm这样的模型中提取系数表的相关性?

解析R函数中的变量时出现的问题

如何根据R中其他变量的类别汇总值?

如何移除GGPlot中超出与面相交的任何格网像元

在不对R中的变量分组的情况下取两行的平均值

删除字符串R中的重复项

排序R矩阵的行和列