我正在处理一个大型调查数据框,其中每个问题的回答都是一个数字.对于像年龄这样的数字调查问题,数字就是数字.但对于多项 Select 题,数字是与保存在单独查找数据框中的文本相对应的代码.
How can I replace all the numbers for each variable with their corresponding label from the lookup dataframe?个
示例数据:
df_numeric <-
tibble::tribble(
~gender, ~age, ~city, ~yearly_income, ~fav_colour, ~over_100_more_vars,
1, 22, 1, 55000, 1, "...",
2, 31, 2, 122000, 2, "...",
1, 41, 1, 101000, 2, "...",
2, 19, 5, 76000, 1, "...",
1, 64, 7, 32000, 6, "...")
df_lookup <-
tibble::tribble(
~variable, ~number, ~label,
"gender", 1, "Male",
"gender", 2, "Female",
"city", 1, "New York",
"city", 2, "Sydney",
"city", 5, "London",
"city", 7, "Paris",
"fav_colour", 1, "Red",
"fav_colour", 2, "Blue",
"fav_colour", 6, "Purple",
"one_of_100_more", 1, "Label",
"one_of_100_more", 2, "Label",
"two_of_100_more", 1, "Label",
"etc", 1, "etc")
理想情况下,我想做的事情是这样的:判断df_NUMERIC中的变量名,在df_lookup中查找该变量,然后对于该特定变量,将每个‘number’替换为其对应的‘Label’,然后移动到下一个变量,将其数字替换为其标签,然后移动到下一个...它应该看起来像这样
df_output <-
tibble::tribble(
~gender, ~age, ~city, ~yearly_income, ~fav_colour, ~over_100_more_vars,
"Male", 22, "New York", 55000, "Red", "...",
"Female", 31, "Sydney", 122000, "Blue", "...",
"Male", 41, "New York", 101000, "Blue", "...",
"Female", 19, "London", 76000, "Red", "...",
"Male", 64, "Paris", 32000, "Purple", "...")
重要注意事项:
-
有数百个变量,所以在代码中写出每个变量的名称是不可行的(例如this answer).
-
我们只需要替换性别、城市等字符变量.不需要替换年龄和收入等数值变量的值,因为这些值已经是正确的格式.这些已采用正确格式的数值变量不在df_lookup中.