我有一个数据帧,看起来像这样.我需要使用PROJ_ID列中的字符串创建一个新的值列,并从PROJ_NAME列形成值字符串.
这里提供的解决方案(Accumulate values from one column based on keys of another column in r)需要很长时间才能运行(并且不能在下面显示的情况下产生正确的输出),所以我使用@r2evans推荐.
PROJ_ID PROJ_NAME
1 KA0034 A JST#3
2 KA0034.10 A JST#3-Dares
3 KA0034.10.110201 A JST#3-Dares
4 KA0034.10.110201.LOV VOM
5 KA0034.10.110201.LOV.MAX A JST#3-Dares
6 KA0034.FN Some Invent
7 KA0034.FN.010XYZ Some Invent
8 KA0034.FN.010XYZ.LEX Some Invent
9 KA0034.FN.010XYZ.LEX.NAT A JST#3
input <- data.frame(
PROJ_ID = c("KA0034",
"KA0034.10",
"KA0034.10.110201",
"KA0034.10.110201.LOV",
"KA0034.10.110201.LOV.MAX",
"KA0034.FN",
"KA0034.FN.010XYZ",
"KA0034.FN.010XYZ.LEX",
"KA0034.FN.010XYZ.LEX.NAT"),
PROJ_NAME = c("A JST#3",
"A JST#3-Dares",
"A JST#3-Dares",
"VOM",
"A JST#3-Dares",
"Some Invent",
"Some Invent",
"Some Invent",
"A JST#3")
)
fun <- function(st) strcapture("(.*)[.][^.]+$", st, list(L=""))$L
input <- input %>%
mutate(K = fun(PROJ_ID))
while (TRUE) {
input <- left_join(input, select(input, PROJ_ID, iss = PROJ_NAME), by = c("K" = "PROJ_ID")) %>%
mutate(
PROJ_NAME = if_else(is.na(iss), PROJ_NAME,
if_else(PROJ_ID == K, PROJ_NAME, paste(iss, PROJ_NAME, sep = "."))),
K = fun(K)) %>%
select(-iss)
if (all(is.na(input$K))) break
}
input$K <- NULL
# Update the PROJ_NAME column to include repeated parts for each unique PROJ_NAME
input$PROJ_NAME <- sapply(strsplit(as.character(input$PROJ_NAME), "\\."), function(x) {
unique_parts <- unique(x)
paste(unique_parts, collapse = ".")
})
# Print the updated dataframe
print(input)
输出:
PROJ_ID PROJ_NAME
1 KA0034 A JST#3
2 KA0034.10 A JST#3.A JST#3-Dares
3 KA0034.10.110201 A JST#3.A JST#3-Dares
4 KA0034.10.110201.LOV A JST#3.A JST#3-Dares.VOM
5 KA0034.10.110201.LOV.MAX A JST#3.A JST#3-Dares.VOM
6 KA0034.FN A JST#3.Some Invent
7 KA0034.FN.010XYZ A JST#3.Some Invent
8 KA0034.FN.010XYZ.LEX A JST#3.Some Invent
9 KA0034.FN.010XYZ.LEX.NAT A JST#3.Some Invent
需要的输出:
PROJ_ID PROJ_NAME
1 KA0034 A JST#3
2 KA0034.10 A JST#3.A JST#3-Dares
3 KA0034.10.110201 A JST#3.A JST#3-Dares.A JST#3-Dares
4 KA0034.10.110201.LOV A JST#3.A JST#3-Dares.A JST#3-Dares.VOM
5 KA0034.10.110201.LOV.MAX A JST#3.A JST#3-Dares.A JST#3-Dares.VOM.A JST#3-Dares
6 KA0034.FN A JST#3.Some Invent
7 KA0034.FN.010XYZ A JST#3.Some Invent.Some Invent
8 KA0034.FN.010XYZ.LEX A JST#3.Some Invent.Some Invent.Some Invent
9 KA0034.FN.010XYZ.LEX.NAT A JST#3.Some Invent.Some Invent.Some Invent.A JST#3
在这里,后缀(最后一个句点之后的部分)被连接到前缀字符串值.
例如:
-
KA0034.10.110201.LOV
是VOM
(LOV是后缀) - 但
KA0034.10.110201
是JST#3-Dares
(110201是后缀) - 同样,
KA0034.10
是JST#3-Dares
(10是后缀) -
KA0034
等于A JST#3
因此,结果字符串为A JST#3.A JST#3-Dares.A JST#3-Dares.VOM