使用Position()
可避免扫描整个向量:
df[Position(\(x) x!=0, df$y):Position(\(x) x!=0, df$y, right = TRUE), ]
# x y z
# 3 5 3 6
# 4 0 0 8
# 5 7 0 9
# 6 6 4 4
一个dplyr选项:
library(dplyr)
df |>
#group_by(id, date)
slice(foo(y))
哪里
foo <- function(vec) Position(\(x) x!=0, vec):Position(\(x) x!=0, vec, right = TRUE)
当可以跳过很大一部分向量时,通过位置获得性能yield 的示例(仅孤立地显示位置可以获得yield ).
set.seed(10)
x <- sample(c(0,1), prob = c(0.99,0.01), size = 10e4, replace =TRUE)
microbenchmark::microbenchmark(
head(which(x!=0),1),
head(which(cumsum(!!x) > 0), 1),
Position = Position(\(x) x!=0, x),
Position2 = (\(pred, x) for (i in seq_along(x)) if (pred(x[i])) return(i))(\(x) x!=0, x)
)
# Unit: microseconds
# expr min lq mean median uq max neval
# head(which(x != 0), 1) 327.8 332.10 386.538 342.65 405.75 966.7 100
# head(which(cumsum(!!x) > 0), 1) 993.8 1024.95 1311.003 1077.95 1219.60 8659.3 100
# Position 63.1 65.00 78.374 68.15 71.15 719.2 100
# Position2 62.4 63.75 97.533 65.40 68.35 2881.3 100