(Edit:BASE-R和dplyr+tidyr代码的第一个版本都使用了duplicated
,这将错误地删除第1行第4列中的12
.它已被编辑为不使用duplicated
.)
base R
将列与更新列进行比较的约简.
df[,-1] <- Reduce(
function(prev, this) replace(this, is.na(prev) | this == prev, this[NA][1]),
df[,-1], accumulate = TRUE)
df
# Site 2001-12-01 to 2021-12-01 1991-12-01.to 2021-12-01 1981-12-01 to 2021-12-01 1971-12-01 to 2021-12-01
# 1 Att-Bissen 12 5 12 19
# 2 Alz-Ettelbruck 1 4 7 NA
# 3 Our-Gemund/Vianden 12 NA NA NA
# 4 Syre Felsmuhle/Mertert 1 6 20 13
# 5 Ernz Blanche-Larochette 8 14 NA NA
我在两个地方都硬编码了df[,-1]
,它也可以很容易地也是df[,2:5]
,它只需要在两个地方是相同的(<-
的LHS和Reduce
内).
dplyr+tidyr
这失go 了一些效率,因为它是双枢轴的.
library(dplyr)
library(tidyr) # pivot_*
df %>%
pivot_longer(cols = -Site) %>%
arrange(Site, desc(name)) %>%
mutate(.by = "Site", value = if_else(value == lag(value, default=-1L), value[NA], value)) %>%
pivot_wider(id_cols = Site) %>%
slice(match(Site, df$Site)) %>%
select(match(names(.), names(df)))
# # A tibble: 5 × 5
# Site `2001-12-01 to 2021-12-01` `1991-12-01.to 2021-12-01` `1981-12-01 to 2021-12-01` `1971-12-01 to 2021-12-01`
# <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Att-Bissen 12 5 12 19
# 2 Alz-Ettelbruck 1 4 7 NA
# 3 Syre Felsmuhle/Mertert 1 6 20 13
# 4 Ernz Blanche-Larochette 8 14 NA NA
# 5 Our-Gemund/Vianden 12 NA NA NA
旋转的一个副作用是不能保证恢复行和列的顺序,所以我在末尾添加了最美观的slice(.) %>% select(.)
,以便与您的输入数据相匹配.(这完全不是必需的.)
数据
df <- structure(list(Site = c("Att-Bissen", "Alz-Ettelbruck", "Our-Gemund/Vianden", "Syre Felsmuhle/Mertert", "Ernz Blanche-Larochette"), "2001-12-01 to 2021-12-01" = c(12, 1, 12, 1, 8), "1991-12-01.to 2021-12-01" = c(5, 4, 12, 6, 14), "1981-12-01 to 2021-12-01" = c(12, 7, 12, 20, 14), "1971-12-01 to 2021-12-01" = c(19, 7, 12, 13, 14)), class = "data.frame", row.names = c(NA, -5L))