在函数中执行的代码是该函数的内部代码,全局环境不变.如果您想像这样编写带有"副作用"的代码,则需要使用for
循环,而不是在函数内部.
variable <- "setosa"
for(sp in c("Species","Species2")) {
levels(iris_df[[sp]])[levels(iris_df[[sp]]) == variable] <- NA
}
如果您想使用purrr::map
,那么您需要您的函数返回一些有用的东西,并且您需要将结果赋值为<-
或=
.虽然在修改数据框的列时使用dplyr::mutate
可能更容易:
## reset the sample data
iris_df <- iris
iris_df$Species2 <- iris_df$Species
variable <- "setosa"
iris_df <- iris_df |> ## note that we assign the result, so iris_df is modified
mutate(across(c("Species","Species2"), \(x) {
levels(x)[levels(x) == variable] <- NA
x ## the function returns the modified column
}
))
如果您想要创建一个更通用的"Drop Level(S)from Column(S)"函数,那么我们可以将这两种方法中的任何一种包装到一个函数中,但您需要传入数据框并将结果赋给相同的数据框或新的数据框:
drop_col_levels_for = function(df, cols, levs) {
for(i in seq_along(cols)) {
levels(df[[cols[i]]])[levels(df[[cols[i]]]) %in% levs] = NA
}
df
}
drop_col_levels_dplyr = function(df, cols, levs) {
mutate(df, across(all_of(cols), \(x) {
levels(x)[levels(x) %in% levs] = NA
x
}))
}
drop_col_levels_for(iris_df, c("Species","Species2"), "setosa") |>
summary()
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species Species2
# Min. :4.300 Min. :2.000 Min. :1.000 Min. :0.100 versicolor:50 versicolor:50
# 1st Qu.:5.100 1st Qu.:2.800 1st Qu.:1.600 1st Qu.:0.300 virginica :50 virginica :50
# Median :5.800 Median :3.000 Median :4.350 Median :1.300 NA's :50 NA's :50
# Mean :5.843 Mean :3.057 Mean :3.758 Mean :1.199
# 3rd Qu.:6.400 3rd Qu.:3.300 3rd Qu.:5.100 3rd Qu.:1.800
# Max. :7.900 Max. :4.400 Max. :6.900 Max. :2.500
drop_col_levels_dplyr(iris_df, c("Species","Species2"), "setosa") |>
summary()
## same result