您好,我有一个样例数据集,如下所示.
# Load the tidyverse package
library(tidyverse)
# Create the dataset
id <- 1:6
model <- c("0RB3211", NA, "0RB4191",
NA, "0RB4033", NA)
UPC <- c("805289119081", "DK_0RB3447CP_RBCP 50", "8053672006360",
"Green_Classic_G-15_Polar_1.67_PREM_SV", "805289044604",
"DK_0RB2132CP_RBCP 55")
df <- tibble(id, model, UPC)
对于‘MODEL’列中缺少的值,如果其对应的UPC以DK开头,我需要提取第一个下划线后面的7位数字和字母,然后将其放入‘MODEL’列.例如,对于第二行,我需要将"0RB3447"放入‘MODEL’列,对于第四行,我需要删除整行,对于最后一行,我需要将"0RB2132"放入‘MODEL’列.
# Manipulate the dataset
df_cleaned <- df %>%
rowwise() %>%
mutate(model = ifelse(is.na(model) & str_detect(UPC, "^DK"),
str_extract(UPC, "\\d{2}RB\\d{4}"),
model)) %>%
ungroup() %>%
filter(!(is.na(model) & str_detect(UPC, "[^0-9]")))
# Display the cleaned dataset
print(df_cleaned)
However, it only returns this wrong result.
如何修改我以前的代码? 真的很感激.