在R中使用str_replace_all为两种以上的字符串类型重命名列

发布于11月10日

我有一个具有列标签的数据集(Dataraw)，如

条件1_男人，条件1_女人，条件2_男人，条件3_女人(等)

我想用它们的名称替换字符串‘Condition1’、‘Condition2’.

condition1名女性=related名女性；

condition2人=unrelated人；

condition3人=filler人；

当前代码:

data <- dataraw %>%
 rename_all(~ str_replace_all(str_replace(., 'condition1', "related"), 'condition2', "unrelated"))

这适用于最多2个字符串，每次我try 添加第三个字符串时，都会收到意外的符号错误.

 data <- dataraw %>%
rename_all(~ str_replace_all(str_replace((., 'condition1', "related"), 'condition2', "unrelated"), 'condition3', "filler")))

我相信这一定很简单，但无论我try 什么组合，我都会遇到错误. 谁能为我指出我正在犯的这个简单的错误？谢谢.

推荐答案

6年前，rename_all被rename_with取代了，我将使用这一点:

library(dplyr)
dataraw <- data.frame(condition1_men=1, condition1_women=2, condition2_men=3, condition2_women=4, condition3_men=5)
dataraw
#   condition1_men condition1_women condition2_men condition2_women condition3_men
# 1              1                2              3                4              5
dataraw |>
  rename_with(.fn = ~ sub("^condition1_", "related_", sub("^condition2_", "unrelated_", .)))
#   related_men related_women unrelated_men unrelated_women condition3_men
# 1           1             2             3               4              5

如果您有一个(命名的)"from=to"赋值向量，我们也可以这样做，更一般一点:

conds <- c(condition1="related", condition2="unrelated")
dataraw |>
  rename_with(.fn = ~ Reduce(function(st, i) sub(names(conds)[i], conds[i], st), seq_along(conds), init = .x))
#   related_men related_women unrelated_men unrelated_women condition3_men
# 1           1             2             3               4              5

我们需要Reduce个，因为我们需要保留以前条件映射的所有更改.

我经常发现，像这样的数据(在以后的数据处理/分析中)以较长的格式(如利米建议的那样)表现得更好.为此，我们还可以做:

dataraw |>
  tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
                      names_to = c("cond", ".value")) |>
  mutate(cond2 = conds[match(sub("_.*", "", cond), names(conds))])
# # A tibble: 3 × 4
#   cond         men women cond2    
#   <chr>      <dbl> <dbl> <chr>    
# 1 condition1     1     2 related  
# 2 condition2     3     4 unrelated
# 3 condition3     5    NA NA

不过，如果您的映射位于不同的框架中，可能会更简单(数据管理、可视化、更新等)，我们可以将其合并/连接到原始数据上:

cond_df <- tribble(
  ~ cond, ~ cond2
  , "condition1", "related"
  , "condition2", "unrelated"
)
dataraw |>
  tidyr::pivot_longer(cols = everything(), names_pattern = "(.*)_(.*)",
                      names_to = c("cond", ".value")) |>
  left_join(cond_df, by = "cond")
# # A tibble: 3 × 4
#   cond         men women cond2    
#   <chr>      <dbl> <dbl> <chr>    
# 1 condition1     1     2 related  
# 2 condition2     3     4 unrelated
# 3 condition3     5    NA NA