我目前正在处理一项调查的原始数据导出.我目前正处于清理阶段,一些人口统计变量的 struct 方式不利于分析.具体地说,受访者可以 Select 多个种族选项,数据导出中的外观如下所示:
以下是复制数据集的代码:
ID <- c(rep(c(1:8), 1))
White <- c("White",NA,NA,NA,NA,NA,"White","White")
Asian <- c(NA,NA,NA,NA,NA,"Asian",NA,"Asian")
SouthAfrican <- c(NA,"SouthAfrican",NA, NA,NA, NA, NA, "SouthAfrican")
Hispanic <- c(NA, NA, NA, NA, "Hispanic", "Hispanic", NA, NA)
WestAsian <- c(NA, NA, NA, NA, NA, NA, "WestAsian", NA)
PreferNotToAnswer <- c(NA, NA,"PreferNotToAnswer", "PreferNotToAnswer", NA, NA, NA, NA)
df <- data.frame(ID, White, Asian, SouthAfrican, Hispanic, WestAsian, PreferNotToAnswer)
我想要做的是找到一种方法,创建一个"Race"列,并将单独列中的值合并到那个Race变量中.如果受访者 Select 了多个选项作为他们的种族(即,ID 8),我想要找到一种方法来将该受访者的种族变量编码为"混合".
所以我想要的结果是:
ID <- c(rep(c(1:8), 1))
White <- c("White",NA,NA,NA,NA,NA,"White","White")
Asian <- c(NA,NA,NA,NA,NA,"Asian",NA,"Asian")
SouthAfrican <- c(NA,"SouthAfrican",NA, NA,NA, NA, NA, "SouthAfrican")
Hispanic <- c(NA, NA, NA, NA, "Hispanic", "Hispanic", NA, NA)
WestAsian <- c(NA, NA, NA, NA, NA, NA, "WestAsian", NA)
PreferNotToAnswer <- c(NA, NA,"PreferNotToAnswer", "PreferNotToAnswer",NA, NA, NA, NA)
Race <- c("White","SouthAfrican","PreferNotToAnswer", "PreferNotToAnswer","Hispanic", "Mixed", "Mixed", "Mixed")
df <- data.frame(ID, White, Asian, SouthAfrican, Hispanic,WestAsian,PreferNotToAnswer,Race)