R 有没有办法根据两个数值范围值列对文件进行聚类或排序

发布于07月27日

I have a big file and I try to find a way to do sorting or do clustering of the data according to two numeric columns that are in a range of numbers, but I could not find correct or fit function regarding my question. Could you please someone how knows help me.
Thanks in advance.

My file is like this sample file but very big and as you see in this example, first and second rows are like alternating numbers (I mean without any gap in between (sequence number)) and also third and forth rows are like that, but rows fifth and sixth are different and actually far from eachother. Therefore, I want to consider first and second as a one cluster, third and forth as a one cluster, fifth and sixth as a two different clusters to have at the end 4 rows instead of 6 rows because rows 1,2 and 3,4 are in one range without any gap in between.
Example file:

df <- setDT(data.frame(name = c("chr1", "chr1", "chr1", "chr1","chr1","chr1"), 
  start = c(8480001, 8480251, 10006251, 10006501,13910501,14841751), 
  end = c(8480250, 8480500, 10006500, 10006750,13910750,14842000),
  length = c(250, 250, 250, 250,250,250)))

预期输出:

output <- setDT(data.frame(name = c("chr1", "chr1", "chr1", "chr1"),
  start = c(8480001, 10006251, 13910501, 14841751), 
  end = c(8480250, 10006500, 13910750, 14842000), 
  length = c(250, 250, 250, 250)))

在输出中，我只想要在一个集群中的那些行的第一行，例如，1和2只有行1.

再次感谢您.

name start end length <char> <num> <num> <num> 1: chr1 8480001 8480250 250 2: chr1 10006251 10006500 250 3: chr1 13910501 13910750 250 4: chr1 14841751 14842000 250

R 有没有办法根据两个数值范围值列对文件进行聚类或排序

推荐答案

R相关问答推荐

使用ggcorrplot删除值，但保留不重要相关性的 colored颜色

修改dDeliverr中列表列的最后一个元素

生成具有受控相关性的x和y

R通过字符串中的索引连接数据帧r

geom_raster不适用于x比例中超过2，15的值

从具有随机模式的字符串中提取值

查找具有平局的多个列的最大值并返回列名或平局 destruct 者NA值

如何删除多个.CSV文件的行

如何优化向量的以下条件赋值？

如何同时从多个列表中获取名字？

给定开始日期和月份(数字)，如何根据R中的开始日期和月数创建日期列

具有重复元素的维恩图

计算直线上点到参考点的总距离

在另一个包中设置断点&S R函数

仅当后续值与特定值匹配时，才在列中回填Nas

R -基线图-图形周围的阴影区域

如何阻止围堵地理密度图？

在生成打印的自定义函数中，可以通过变量将线型或 colored颜色设置为NULL吗？

构建一个6/49彩票模拟系统

替换在以前工作的代码中有x行&q；错误(geom_sf/gganimate/dow_mark)