作为我的问题的一个简化示例,假设我有四个data.tables
dt1
、...、dt4
,它们都具有相同的 struct :
head(dt1)
date x y
1: 2000-10-01 0.4527087 -0.11590788
2: 2001-10-01 0.7200252 -0.55722270
3: 2002-10-01 -1.3804472 -1.47030087
4: 2003-10-01 -0.1380225 2.34157766
5: 2004-10-01 -0.9288675 -1.32993998
6: 2005-10-01 -0.9592633 0.76316150
也就是说,它们都有三列,分别称为date
、x
和y
.我想要的输出是一个合并的data.table
(On Date),其中有5列:date
,然后每个表中的x
列被重命名以反映其原始的data.table
:
head(desired_output)
date x_dt1 x_dt2 x_dt3 x_dt4
1: 2000-10-01 0.4527087 -0.11590788 1.1581946 -1.5159040
2: 2001-10-01 0.7200252 -0.55722270 -1.6247254 -0.3325556
3: 2002-10-01 -1.3804472 -1.47030087 -0.9766309 -0.2368857
4: 2003-10-01 -0.1380225 2.34157766 1.1831091 -0.4399184
5: 2004-10-01 -0.9288675 -1.32993998 0.8716144 -0.4086229
6: 2005-10-01 -0.9592633 0.76316150 -0.8860816 -0.4299365
我假设可以使用merge.data.table
的suffixes
参数以某种方式实现这一点.到目前为止,我已经try 从this answer个修改到mergeDTs
个,但都没有成功.一个成功修改mergeDTs
的解决方案(或者只使用一个可以应用于包含data.tables
个的列表的函数)将是非常好的.
我知道有this very slick dplyr/purrr answer个,但我更喜欢data.table
个解决方案.
Example data
library(data.table)
dt1 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt2 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt3 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
dt4 <- data.table(date = seq(from = as.Date("2000-10-01"), to = as.Date("2010-10-01"), by = "years"),
x = rnorm(11),
y = rnorm(11))
解
下面,我将B·克里斯蒂安·卡姆刚的答案转换为函数形式(使其易于适应我的实际问题),并消除了对新管道的依赖(因为我的组织尚未升级):
merge_select <- function(on, vars, ..., suffix = "_") {
dts <- list(...)
names(dts) <- sapply(as.list(substitute(list(...)))[-1L], deparse)
nv <- length(vars)
ndt <- length(dts)
old_cols <- split(rep(vars, ndt),
ceiling(seq_along(rep(vars, ndt))/nv))
new_cols <- split(paste0(vars, suffix, rep(names(dts), each = nv)),
ceiling(seq_along(paste0(vars,
suffix,
rep(names(dts), each = nv)))/nv))
sep_cols <- lapply(dts, function(x) subset(x, select = c(on, vars)))
Reduce(f = function(x,y) merge(x, y, by = on),
Map(f = setnames, sep_cols, old_cols, new_cols))
}
在我的情况下,这意味着:
merge_select("date", "x", dt1,dt2,dt3,dt4)
date x_dt1 x_dt2 x_dt3 x_dt4
1: 2000-10-01 -0.6365707 0.11804268 -0.01084163 -0.88127011
2: 2001-10-01 -0.2533127 -3.16924568 0.45746415 0.69742537
3: 2002-10-01 2.3069266 -0.82670409 -0.54236745 -1.49613384
4: 2003-10-01 0.7075547 -0.91809007 -0.67888707 -0.26106146
5: 2004-10-01 -0.7165651 -0.45711888 -0.83903416 1.45113260
6: 2005-10-01 0.5703561 0.24587897 0.13862020 0.33928202
7: 2006-10-01 -0.6258097 -0.77652389 -0.49252474 -0.80460241
8: 2007-10-01 -0.4600565 0.55612959 0.86749410 -1.30850411
9: 2008-10-01 -0.8841649 -0.48113848 -1.55858406 0.83076846
10: 2009-10-01 -0.6262272 -0.73618265 0.13350581 0.06640803
11: 2010-10-01 0.1406454 0.08994779 1.28450204 -1.18329081
```