我收到一个EXCEL文件,如下所示:

structure(list(`Variable Name` = c("form.hhConsent", NA, NA, 
"form.hhReasonRefused", NA, NA, NA, "form.childrenRepeat.childrenGroup.childSex", 
NA, NA, "form.childrenRepeat.childrenGroup.dobSourceWho", NA, 
NA, NA, NA, "form.hohSex", NA, NA, "form.mothersRepeat.mothersGroup.spouseSex", 
NA, NA, "form.childrenRepeat.childrenGroup.treatmentConsent", 
NA, NA, NA, NA, "form.childrenRepeat.childrenGroup.noTreatReason", 
NA, NA, NA), Type = c(NA, "1", "0", NA, "1", "2", "99", NA, "1", 
"2", NA, "1", "2", "3", "99", NA, "1", "2", NA, "1", "2", NA, 
"1", "0", "2", "3", NA, "1", "2", "99"), Meaning = c(NA, "Yes", 
"No", NA, "No responsible could be found", "They don't want to", 
"Other reason", NA, "Male", "Female", NA, "Mother", "Father", 
"Neighbor", "Others", NA, "Male", "Female", NA, "Male", "Female", 
NA, "Yes", "No", "AMR grappe", "M54 grappe", NA, "allergy", "parent refusal", 
"other"), Missing_values = c(NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA), Format_values = c(NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA)), row.names = c(NA, -30L), class = "data.frame")

当前文件如下所示:

variable name   type      meaning
X1
                xxx         xxx
                xxx         xxx     
X2
                xxx.        xxx
                xxx         xxx
                xxx         xxx
x3
                xxx         xxx 

我想将其转换为常规数据集,如下所示:

variable name   type   meaning
x1               xxx.   xxx
x1               xxx.   xxx
x2               xxx    xxx
x2               xxx.   xxx
x2               xxx.   xxx
x3               xxx.   xxx

我的 idea 是:首先按"变量名"分组,然后补上缺失的变量名,最后一步是过滤掉类型和含义列中没有任何值的标题行.但这并不管用.问题是不存在可以为GROUP_BY的变量.

有没有人能帮我想一想怎么才能做到这一点?非常感谢~~!

推荐答案

我们使用fill(从tidyr开始)用前面的非NA替换变量名中的NA元素,然后用select替换前三列,并使用if_all替换filter,以仅保留‘Type’、‘Meantion’列中所有非NA元素的行

library(dplyr)
library(tidyr)
df1 %>% 
  fill(`Variable Name`) %>%
  select(1:3) %>% 
  filter(if_all(Type:Meaning, complete.cases))

-输出

# A tibble: 22 × 3
   `Variable Name`                                Type  Meaning                      
   <chr>                                          <chr> <chr>                        
 1 form.hhConsent                                 1     Yes                          
 2 form.hhConsent                                 0     No                           
 3 form.hhReasonRefused                           1     No responsible could be found
 4 form.hhReasonRefused                           2     They don't want to           
 5 form.hhReasonRefused                           99    Other reason                 
 6 form.childrenRepeat.childrenGroup.childSex     1     Male                         
 7 form.childrenRepeat.childrenGroup.childSex     2     Female                       
 8 form.childrenRepeat.childrenGroup.dobSourceWho 1     Mother                       
 9 form.childrenRepeat.childrenGroup.dobSourceWho 2     Father                       
10 form.childrenRepeat.childrenGroup.dobSourceWho 3     Neighbor                     
# … with 12 more rows

编辑:基于@Leon Samson的 comments

R相关问答推荐

在值和NA的行顺序中寻找中断模式

如何根据条件计算时差(天)

如何在RMarkdown LaTex PDF输出中包含英语和阿拉伯语?

当我们有多个特殊字符时,使用gsub删除名称和代码'

为什么当用osmdata映射R时会得到相邻状态?

多个过滤器内的一个盒子在仪表板Quarto

如何写商,水平线,在一个单元格的表在R

如何从容器函数中提取conf并添加到ggplot2中?

用两种 colored颜色 填充方框图

自动STAT_SUMMARY统计与手动标准误差之间的差异

扩展R中包含列表的数据框

观察器中的inaliateLater的位置

Geom_arcbar()中出错:找不到函数";geom_arcbar";

变长向量的矢量化和

是否有可能从边界中找到一个点值?

对R中的列表列执行ROW Mean操作

如何在R中创建条形图,使条形图在y轴上围绕0.5而不是0构建条形图?

从单个html段落中提取键-值对

reshape 数据帧-基于组将行转换为列

等价于Plot_ly R中的geom_函数