(2)只需覆盖(1)指定的输入条件(3)即可维护(not):
1a) mutate_cond为可以合并到管道中的数据帧或数据表创建一个简单函数.此函数类似于mutate
,但仅作用于满足条件的行:
mutate_cond <- function(.data, condition, ..., envir = parent.frame()) {
condition <- eval(substitute(condition), .data, envir)
.data[condition, ] <- .data[condition, ] %>% mutate(...)
.data
}
DF %>% mutate_cond(measure == 'exit', qty.exit = qty, cf = 0, delta.watts = 13)
1b) mutate_last This is an alternative function for data frames or data tables which again is like mutate
but is only used within group_by
(as in the example below) and only operates on the last group rather than every group. Note that TRUE > FALSE so if group_by
specifies a condition then mutate_last
will only operate on rows satisfying that condition.
mutate_last <- function(.data, ...) {
n <- n_groups(.data)
indices <- attr(.data, "indices")[[n]] + 1
.data[indices, ] <- .data[indices, ] %>% mutate(...)
.data
}
DF %>%
group_by(is.exit = measure == 'exit') %>%
mutate_last(qty.exit = qty, cf = 0, delta.watts = 13) %>%
ungroup() %>%
select(-is.exit)
2) factor out condition通过使其成为一个额外的列,然后将其移除,从而计算出该条件.然后使用ifelse
、replace
或算术与逻辑,如图所示.这也适用于数据表.
library(dplyr)
DF %>% mutate(is.exit = measure == 'exit',
qty.exit = ifelse(is.exit, qty, qty.exit),
cf = (!is.exit) * cf,
delta.watts = replace(delta.watts, is.exit, 13)) %>%
select(-is.exit)
3) sqldf我们可以通过管道中的sqldf包使用SQL update
来处理数据帧(但不能使用数据表,除非我们转换它们——这可能代表dplyr中的一个缺陷.参见dplyr issue 1579).由于update
的存在,我们似乎不希望修改这段代码中的输入,但事实上update
作用于临时生成的数据库中输入的副本,而不是实际输入.
library(sqldf)
DF %>%
do(sqldf(c("update '.'
set 'qty.exit' = qty, cf = 0, 'delta.watts' = 13
where measure = 'exit'",
"select * from '.'")))
4) row_case_when还可以查看中定义的row_case_when
library(dplyr)
DF %>%
row_case_when(
measure == "exit" ~ data.frame(qty.exit = qty, cf = 0, delta.watts = 13),
TRUE ~ data.frame(qty.exit, cf, delta.watts)
)
Note 1:我们用这个作为DF
set.seed(1)
DF <- data.frame(site = sample(1:6, 50, replace=T),
space = sample(1:4, 50, replace=T),
measure = sample(c('cfl', 'led', 'linear', 'exit'), 50,
replace=T),
qty = round(runif(50) * 30),
qty.exit = 0,
delta.watts = sample(10.5:100.5, 50, replace=T),
cf = runif(50))
Note 2: dplyr第134、631、1518和1573期中也讨论了如何轻松指定更新行子集的问题,其中631是主线,1573是对这里答案的回顾.