我通过from JSON()从一个JSON文件中收集了一个深度嵌套的列表.下面是一个最小的示例,显示了嵌套,但只有2个条目:

conditions <- list(
  list(
    PMID = 00001,
    Phrases = list(
      list(
        PhraseText = "Hodgkin Lymphoma",
        Mappings = mappings1 <- list(
          list(
            MappingScore = 1000,
            MappingCandidates = mc1 <- list(
              list(CandidateScore = 1000,
                   CandidateCUI = "C075655",
                   CandidateMatched = "Hodgkins Lymphoma",
                   CandidatePreferred = "Hodgkins Lymphoma",
                   MatchedWords = list(c("hodgkin", "lymphoma"))),
              list(CandidateScore = 850,
                   CandidateCUI = "C095659",
                   CandidateMatched = "Lymphoma",
                   CandidatePreferred = "Lymphoma",
                   MatchedWords = list(c("lymphoma"))))
          )
        )
      )
    )
  ),
  list(
    PMID = 00002,
    Phrases = list(
      list(
        PhraseText = "Plaque Psoriasis",
        Mappings = mappings2 <- list(
          list(MappingScore = 1000,
               MappingCandidates = mc2 <- list(
                 list(CandidateScore = 1000,
                      CandidateCUI = "C0125609",
                      CandidateMatched = "Plaque Psoriasis",
                      CandidatePreferred = "Plaque Psoriasis",
                      MatchedWords = list(c("plaque", "psoriasis"))),
                 list(CandidateScore = 750,
                      CandidateCUI = "C0320011",
                      CandidateMatched = "Psoriasis",
                      CandidatePreferred = "Psoriasis",
                      MatchedWords = list(c("psoriasis")))))
        )
      )
    )
  )
)

这些级别中的一些实际上是数据帧,但我似乎无法在不 destruct struct 的情况下在代码中重新创建它. 我正在try 从嵌套列表的多个级别提取特定元素,理想情况下会得到如下所示的输出(或类似的输出):

output <- data.frame(
  PhraseText = c("Hodgkins Lymphoma", "Hodgkins Lymphoma", "Plaque Psoriasis", "Plaque Psoriasis"),
  MappingScore = c(1000, 1000, 1000, 1000),
  CandidateScore = c(1000, 850, 1000, 750),
  CandidateCUI = c("C075655", "C095659", "C0125609", "C0320011"),
  CandidatePreferred = c("Hodgkins Lymphoma", "Lymphoma", "Plaque Psoriasis", "Psoriasis")
)

我try 了几次lApply、map和hoist的迭代--但是循环遍历列表的未命名部分(即MappingCandidate[[1]]和MappingCandidate[[2]])让我感到困惑,而且我似乎无法获得链中最深层的元素(即CandiateCUI)并与顶级元素(PhraseText)相关联.

x <- lapply(conditions, function(i) {
  lapply(i[["Phrases"]][[1]][["Mappings"]], function(j) {
    lapply(j[["MappingCandidates"]], function(k) {
      k[c("CandidateScore", "CandidateCUI", "CandidatePreferred")]
    })
  })
})

推荐答案

使用tidyr,我们可以通过组合对unnest_wider()unnest_longer()的一系列调用来取消嵌套列表:

library(tidyr)

tibble(conditions) |>
  unnest_wider(conditions) |>
  unnest_longer(Phrases) |>
  unnest_wider(Phrases) |>
  unnest_longer(Mappings) |>
  unnest_wider(Mappings) |>
  unnest_longer(MappingCandidates) |>
  unnest_wider(MappingCandidates) |>
  unnest_longer(MatchedWords)
#> # A tibble: 4 × 8
#>    PMID PhraseText       MappingScore CandidateScore CandidateCUI CandidateMatched  CandidatePreferred MatchedWords
#>   <dbl> <chr>                   <dbl>          <dbl> <chr>        <chr>             <chr>              <list>      
#> 1     1 Hodgkin Lymphoma         1000           1000 C075655      Hodgkins Lymphoma Hodgkins Lymphoma  <chr [2]>   
#> 2     1 Hodgkin Lymphoma         1000            850 C095659      Lymphoma          Lymphoma           <chr [1]>   
#> 3     2 Plaque Psoriasis         1000           1000 C0125609     Plaque Psoriasis  Plaque Psoriasis   <chr [2]>   
#> 4     2 Plaque Psoriasis         1000            750 C0320011     Psoriasis         Psoriasis          <chr [1]>

另一种方法(可能更容易推广)是在rrapply包中使用rrapply().这里,使用选项how = "bind"调用rrapply()两次.一次将所有重复的MappingCandidates绑定在一起,一次绑定其他 node (PMIDPhrasesPhraseTextMappingScore):

library(rrapply)

## bind MappingCandidates
candidateNodes <- rrapply(
  conditions, 
  how = "bind", 
  options = list(namecols = TRUE, coldepth = 8)
)
candidateNodes 
#>   L1      L2 L3       L4 L5                L6 L7 CandidateScore CandidateCUI  CandidateMatched CandidatePreferred    MatchedWords.1
#> 1  1 Phrases  1 Mappings  1 MappingCandidates  1           1000      C075655 Hodgkins Lymphoma  Hodgkins Lymphoma hodgkin, lymphoma
#> 2  1 Phrases  1 Mappings  1 MappingCandidates  2            850      C095659          Lymphoma           Lymphoma          lymphoma
#> 3  2 Phrases  1 Mappings  1 MappingCandidates  1           1000     C0125609  Plaque Psoriasis   Plaque Psoriasis plaque, psoriasis
#> 4  2 Phrases  1 Mappings  1 MappingCandidates  2            750     C0320011         Psoriasis          Psoriasis         psoriasis

## bind other nodes
otherNodes <- rrapply(
  conditions, 
  condition = \(x, .xparents) !"MappingCandidates" %in% .xparents, 
  how = "bind", 
  options = list(namecols = TRUE)
)
otherNodes
#>   L1 PMID Phrases.1.PhraseText Phrases.1.Mappings.1.MappingScore
#> 1  1    1     Hodgkin Lymphoma                              1000
#> 2  2    2     Plaque Psoriasis                              1000

## merge into single data.frame
allNodes <- merge(candidateNodes, otherNodes, by = "L1")
allNodes
#>   L1      L2 L3       L4 L5                L6 L7 CandidateScore CandidateCUI  CandidateMatched CandidatePreferred    MatchedWords.1 PMID Phrases.1.PhraseText Phrases.1.Mappings.1.MappingScore
#> 1  1 Phrases  1 Mappings  1 MappingCandidates  1           1000      C075655 Hodgkins Lymphoma  Hodgkins Lymphoma hodgkin, lymphoma    1     Hodgkin Lymphoma                              1000
#> 2  1 Phrases  1 Mappings  1 MappingCandidates  2            850      C095659          Lymphoma           Lymphoma          lymphoma    1     Hodgkin Lymphoma                              1000
#> 3  2 Phrases  1 Mappings  1 MappingCandidates  1           1000     C0125609  Plaque Psoriasis   Plaque Psoriasis plaque, psoriasis    2     Plaque Psoriasis                              1000
#> 4  2 Phrases  1 Mappings  1 MappingCandidates  2            750     C0320011         Psoriasis          Psoriasis         psoriasis    2     Plaque Psoriasis                              1000

R相关问答推荐

NA仅省略具有NA的 Select 行

如何将log 2刻度上的数字转换为自然log

通过Plotly绘制线串几何形状的3D图

从嵌套列表中智能提取线性模型系数

如果列中存在相同的字符串,则对行值进行总和

隐藏e_mark_line的工具提示

如何使用按钮切换轨迹?

删除具有相同标题的tabPanel(shinly)

IMF IFS数据以R表示

为什么我的基准测试会随着样本量的增加而出现一些波动?

将饼图插入条形图

如何写商,水平线,在一个单元格的表在R

使用范围和单个数字将数字与字符串进行比较

按时间顺序对不同事件进行分组

将箭头绘制在图形外部,而不是图形内部

R如何将列名转换为更好的年和月格式

手动指定从相同数据创建的叠加图的 colored颜色

我将工作代码重构为一个函数--现在我想不出如何传递轴列参数

快速合并R内的值

当由base::限定时,`[.factor`引发NextMethod错误