I have a big file and I try to find a way to do sorting or do clustering of the data according to two numeric columns that are in a range of numbers, but I could not find correct or fit function regarding my question. Could you please someone how knows help me.
Thanks in advance.
My file is like this sample file but very big and as you see in this example, first and second rows are like alternating numbers (I mean without any gap in between (sequence number)) and also third and forth rows are like that, but rows fifth and sixth are different and actually far from eachother. Therefore, I want to consider first and second as a one cluster, third and forth as a one cluster, fifth and sixth as a two different clusters to have at the end 4 rows instead of 6 rows because rows 1,2 and 3,4 are in one range without any gap in between.
Example file:
df <- setDT(data.frame(name = c("chr1", "chr1", "chr1", "chr1","chr1","chr1"),
start = c(8480001, 8480251, 10006251, 10006501,13910501,14841751),
end = c(8480250, 8480500, 10006500, 10006750,13910750,14842000),
length = c(250, 250, 250, 250,250,250)))
预期输出:
output <- setDT(data.frame(name = c("chr1", "chr1", "chr1", "chr1"),
start = c(8480001, 10006251, 13910501, 14841751),
end = c(8480250, 10006500, 13910750, 14842000),
length = c(250, 250, 250, 250)))
在输出中,我只想要在一个集群中的那些行的第一行,例如,1和2只有行1.
再次感谢您.