我有这类数据:
df <- data.frame(
Partcpt = c("B","A","B","C"),
aoi = c("ACA","CB","AA","AABC" )
)
我想用连续的数字替换aoi
中的单个字母,除非字母是重复的,在这种情况下,应重复先前的替换数字.有正则表达式解决这个问题吗?
所需输出如下:
Partcpt aoi
1 B 121
2 A 12
3 B 11
4 C 1123
我有这类数据:
df <- data.frame(
Partcpt = c("B","A","B","C"),
aoi = c("ACA","CB","AA","AABC" )
)
我想用连续的数字替换aoi
中的单个字母,除非字母是重复的,在这种情况下,应重复先前的替换数字.有正则表达式解决这个问题吗?
所需输出如下:
Partcpt aoi
1 B 121
2 A 12
3 B 11
4 C 1123
以下是tidyverse解决方案:
完成这个技巧的线是mutate(ID = match(paste(aoi), unique(paste(aoi))))
->;在为id分组后,我们 for each 唯一的aoi创建唯一的id:
library(dplyr)
library(tidyr)
df %>%
mutate(id = row_number()) %>%
separate_rows(aoi, sep = "(?<!^)(?!$)") %>% #thanks to Chris Ruehlemann
#separate_rows(aoi, sep= "") %>% #alternative
#filter(aoi != "") %>% #alternative
group_by(id) %>%
mutate(ID = match(paste(aoi), unique(paste(aoi)))) %>%
mutate(ID = paste0(ID, collapse = "")) %>%
slice(1) %>%
ungroup() %>%
select(Partcpt, aoi=ID)
或者感谢@Henrik:
sapply(strsplit(df$aoi, split = ""), \(x) paste(match(x, unique(x)), collapse = ""))
Partcpt aoi
<chr> <chr>
1 B 121
2 A 12
3 B 11
4 C 1123