R data.设置函数&；连接中的列值而不使用for循环的表方法

发布于01月14日

有没有办法使用R data.table来设置需要设置两级函数调用才能设置的列值，以及有没有办法在两个data.table之间的联接中设置列值？

示例:这是可行的，但使用了for循环.

library(data.table)

# use something from datasets to illusrate
tAmounts<-data.table(rbind(cbind(ID="Apple", Amt=as.numeric(EuStockMarkets[1:50,2])),
                           cbind(ID="Orange", Amt=as.numeric(EuStockMarkets[1:30,3])),
                           cbind(ID="Lemon", Amt=as.numeric(EuStockMarkets[1:60,4]))))
setkey(tAmounts, ID, Amt)

# is there a data.table way to do this without the for loops?

# summary table with the full hierarchical cluster for each ID
tSummary<-tAmounts[, .(.N, Clust=list()), keyby="ID"]
for (Id in unique(tAmounts$ID)) {
  D<-dist(tAmounts[ID==Id]$Amt)
  C<-hclust(D, method="average")
  tSummary[ID==Id]$Clust<-list(C) # any way to mapply & lapply?
}
# ID      N      Clust
# Apple  50 <hclust[7]>
# Lemon  60 <hclust[7]>
# Orange 30 <hclust[7]>

也许有一种方法可以用lapply和mapply的组合来表示tSummary[, Clust:=hcust(dist(Amt), method="average"), by="ID")？

同样，有没有办法使用函数设置联接中的列？从上面的示例继续:

# table of hierarchical cluster cuts, e.g., height of $20, height of $40
tCuts<-CJ(ID=unique(tAmounts$ID), Cut=seq(20,100,20))
setkey(tCuts, ID, Cut)
# ID    Cut
# Apple  20
# Apple  40
# ...etc...

# table with clusters taken at each cut
tClust<-tCuts[tAmounts, on="ID", allow.cartesian=TRUE]
setkey(tClust, ID, Cut, Amt)
# ID Cut    Amt
# Apple  20 1587.4
# Apple  20 1630.6
# ...etc...
# Orange 100 1789.5 

# set ClustNum for each ID, cut, and amount
for (i in 1:nrow(tCuts)) {
  Id<-tCuts[i]$ID
  tClust[ID==Id & Cut==tCuts[i]$Cut, ClustNum:=cutree(tSummary[ID==Id]$Clust[[1]], h=tCuts[i]$Cut)] # any way to mapply in a join?
}

有没有像tClust[tCuts, ClustNum:=cutree(Clust, h=Cut)]这样的东西可以一次连接并设置值？

cuts <- seq(20,100,20) # EITHER lapply over heights (and create the Cut column "manually"): tClust <- tSummary[, .(Cut = rep(cuts, each=N), ClustNum = unlist(lapply(cuts, function(h) cutree(Clust[[1]], h=h)))), by=ID] # OR pass the vector of heights to cutree(): tClust <- tSummary[, melt(as.data.table(cutree(Clust[[1]], h=cuts)), variable.name="Cut", variable.factor=F, value.name="ClustNum"), by=ID] # Add the amounts column tClust[, Amt := tAmounts[, rep(Amt, times=length(cuts)), by=ID][, ID := NULL]]

library(data.table) tAmounts <- rbind( data.frame(ID="Apple", Amt=as.numeric(EuStockMarkets[1:50,2])), data.frame(ID="Orange", Amt=as.numeric(EuStockMarkets[1:30,3])), data.frame(ID="Lemon", Amt=as.numeric(EuStockMarkets[1:60,4])) ) |> setDT() setkey(tAmounts, ID)

R data.设置函数&；连接中的列值而不使用for循环的表方法

推荐答案

R相关问答推荐

判断字符串中数字的连续性

有没有一个R函数允许你从一个数字变量中提取一个数字，而不考虑它的位置(不仅仅是第一个或最后一个数字？

使用gcuminc，如何使用逗号格式化风险表？

使用case_match()和char数组重新编码值

如何在geom_col中反转条

未识别时区

无法正确设置动态创建的Quarto标注的格式

仅在R中的数据集开始和结束时删除所有 Select 列的具有NA的行

在另一个包中设置断点&S R函数

我如何使用tidyselect来传递一个符号数组，比如Pivot_Long？

将工作目录子文件夹中的文件批量重命名为顺序

仅当后续值与特定值匹配时，才在列中回填Nas

构建一个6/49彩票模拟系统

在同一单元格中创建包含整数和百分比的交叉表

通过比较来自多个数据框的值和R中的条件来添加新列

R中从因数到数字的转换

如何将两个用不同的运算符替换*的矩阵相乘

R data.设置函数&；连接中的列值而不使用for循环的表方法？

如何根据顺序/序列从数据框中排除值

以列名的字符向量作为参数按行应用自定义函数