下面是我将在这个问题中引用的data.table.我的目标是将data1
列和data2
列相加,并在一个名为sum
的新列中给出结果.以下是我最近的一些try 以及我从中获得的结果.显然,没有一个结果是正确的,或者它们是错误的.我知道有很多问题是为了解决类似的问题,但我还没有找到这个问题的解决方案.这可能是一些简单的东西,我只是没有领会.有谁知道我做错了什么吗?在将来,我想添加给出这些列的SD、Mean、Medium和Sem的列.提前感谢您的帮助!
> sum_tabnew
location bin time data1 data2 loc_id condition loc_coord
1: Loc01 (-0.24,1] 0.966764 258 0 1 WT_CTL a1
2: Loc01 (1,2] 2.000012 399 0 1 WT_CTL a1
3: Loc01 (2,3] 2.999657 502 0 1 WT_CTL a1
4: Loc01 (3,4] 3.999978 284 0 1 WT_CTL a1
5: Loc01 (4,5] 4.999684 335 0 1 WT_CTL a1
---
8540: Loc96 (114,115] 115.000607 0 90 96 MUT_CTL h12
8541: Loc96 (115,116] 115.984122 0 708 96 MUT_CTL h12
8542: Loc96 (116,117] 116.967636 0 383 96 MUT_CTL h12
8543: Loc96 (117,118] 117.967847 0 0 96 MUT_CTL h12
8544: Loc96 (118,119] 119.000967 0 0 96 MUT_CTL h12
#Get a vector of all the 'data' columns
data_vec <- colnames(sum_tabnew)[grepl("data",colnames(sum_tabnew))]
#Change all the 'data' columns data types to double, so they won't have to be changed later
sum_tabnew[ , (data_vec) := lapply(.SD, as.double), .SDcols = data_vec]
test_sum <- sum_tabnew[, list(sum = sum(test_tab)), by = list(condition, time)]
> test_sum
condition time sum
1: WT_CTL 0.966764 2294112
2: WT_CTL 2.000012 2294112
3: WT_CTL 2.999657 2294112
4: WT_CTL 3.999978 2294112
5: WT_CTL 4.999684 2294112
---
1066: MUT_CTL 115.000607 2294112
1067: MUT_CTL 115.984122 2294112
1068: MUT_CTL 116.967636 2294112
1069: MUT_CTL 117.967847 2294112
1070: MUT_CTL 119.000967 2294112
test_sum <- sum_tabnew[, sum(.SD), .SDcols = data_vec, by = list(condition, time)]
> test_sum
condition time V1
1: WT_CTL 0.966764 3492
2: WT_CTL 2.000012 399
3: WT_CTL 2.999657 2194
4: WT_CTL 3.999978 284
5: WT_CTL 4.999684 2520
---
1066: MUT_CTL 115.000607 3181
1067: MUT_CTL 115.984122 4524
1068: MUT_CTL 116.967636 6925
1069: MUT_CTL 117.967847 2060
1070: MUT_CTL 119.000967 2159
test_sum <- sum_tabnew[, lapply(.SD, function(x) sum(x)), by = list(condition, time)]
Error in sum(x) : invalid 'type' (character) of argument
UPDATE个 根据r2evens的请求,以下是可使用格式的数据
> dput(head(sum_tabnew,10))
structure(list(location = c("Loc01", "Loc01", "Loc01", "Loc01",
"Loc01", "Loc01", "Loc01", "Loc01", "Loc01", "Loc01"), bin = structure(1:10, levels = c("(-0.24,1]",
"(1,2]", "(2,3]", "(3,4]", "(4,5]", "(5,6]", "(6,7]", "(7,8]",
"(8,9]", "(9,10]"), class = "factor"), time = c(0.966764,
2.000012, 2.999657, 3.999978, 4.999684, 5.999687, 6.999671, 8,
9.00001, 9.999827), data1 = c(258, 399, 502, 284, 335, 309, 0,
82, 1916, 2), data2 = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), loc_id = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), condition = structure(c(12L,
12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L), levels = c("MUT1",
"MUT2", "MUT3", "MUT4", "MUT5", "MUT6", "MUT7", "MUT8", "MUT9",
"MUT10", "MUT_CTL", "WT_CTL"), class = "factor"), loc_coord = c("a1",
"a1", "a1", "a1", "a1", "a1", "a1", "a1", "a1", "a1")), row.names = c(NA,
-10L), class = c("data.table", "data.frame")