在R编程语言中给出以下矩阵:

set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)

下面是一个深度优先搜索(DFS)算法,它识别该矩阵中的1的簇.在这种情况下,"簇"是整数在矩阵上的连续映射,其最小簇大小为3,并且假设8连通性(即,包括对角线).注意:我try 在EBImage包中使用基于图像的方法,但它的执行速度太慢了.我有数千个EBImagexEBImage矩阵要分析!

find_clusters <- function(matrix) {
  rows <- nrow(matrix)
  cols <- ncol(matrix)
  
  # Create a matrix of the same size to mark visited cells
  visited <- matrix(0, nrow = rows, ncol = cols)
  
  # Define all 8 possible movements from a cell (8-connectivity)
  row_nbr <- c(-1, -1, -1,  0, 0,  1, 1, 1)
  col_nbr <- c(-1,  0,  1, -1, 1, -1, 0, 1)
  
  # A function to check if a cell can be included in the DFS
  is_valid <- function(row, col) {
    row >= 1 && row <= rows && col >= 1 && col <= cols &&
      visited[row, col] == 0 && matrix[row, col] == 1
  }
  
  # A function to do a DFS of a 2D boolean matrix. It only considers
  # the 8 cells directly connected to a cell
  DFS <- function(matrix, row, col, visited, cluster) {
    row_stack <- c(row)
    col_stack <- c(col)
    
    while (length(row_stack) > 0) {
      r <- row_stack[length(row_stack)]
      c <- col_stack[length(col_stack)]
      row_stack <- row_stack[-length(row_stack)]
      col_stack <- col_stack[-length(col_stack)]
      
      if (visited[r, c] == 0) {
        visited[r, c] <- 1
        cluster <- rbind(cluster, c(r, c))
        
        for (k in 1:8) {
          if (is_valid(r + row_nbr[k], c + col_nbr[k])) {
            row_stack <- c(row_stack, r + row_nbr[k])
            col_stack <- c(col_stack, c + col_nbr[k])
          }
        }
      }
    }
    return(cluster)
  }
  
  # The main function that returns all clusters
  get_clusters <- function(matrix, visited) {
    clusters <- list()
    for (i in 1:rows) {
      for (j in 1:cols) {
        if (visited[i, j] == 0 && matrix[i, j] == 1) {
          new_cluster <- DFS(matrix, i, j, visited, matrix(, nrow = 0, ncol = 2))
          if (nrow(new_cluster) >= 3) {
            clusters[[length(clusters) + 1]] <- new_cluster
          }
        }
      }
    }
    return(clusters)
  }
  
  return(get_clusters(matrix, visited))
}

效果很好,而且速度很快.但是,此函数返回大小为&gt;3的所有可能的集群(共44个),其中包括嵌套在较大集群中的较小集群.

以二进制图像表示的矩阵:

my_palette <- c("white", "black")

# correct for how image() reads a matrix
rotate <- function(x) t(apply(x, 2, rev))

image(rotate(matrix_1),
      axes = FALSE,
      col = my_palette_2)

我只看到了三个大小为&gt;=3的簇.我如何修改我的函数,使其只"看到"矩阵上最大的未中断的簇?

更新

谢谢你@I_O!我有10000个100x100矩阵来自matlab的模拟,模拟细胞膜上钠ionic 通道的行为.以下函数执行您的建议,并返回通道类型1和2的簇大小:

library(dplyr)
library(terra)

# M: matrix of integers 

find_clusters_2chan <- function(M) {
  
  # Consider only 1s
  ones <- M == 1
  
  # convert matrix to raster
  raster_ones <- ones |> rast()
  
  # find clusters (consider zeros as NA, i. e. discontinuation)
  
  clusters_ones <- patches(raster_ones,
                           directions = 8,
                           zeroAsNA = TRUE)
  
  # generate frequency table
  ones_freq <- the_clusters_ones |> freq()
  
  # return counts >=3
  ones_freq$count %>%
    .[. >= 3] -> ONES
  
  #-------------------------------------------------------------------------------
  
  # Consider only 2s
  twos <- M == 2
  
  # convert matrix to raster
  raster_twos <- twos |> rast()
  
  # find clusters (consider zeros as NA, i. e. discontinuation)
  
  clusters_twos <- patches(raster_twos,
                           directions = 8,
                           zeroAsNA = TRUE)
  
  # generate frequency table
  twos_freq <- clusters_twos |> freq()
  
  # return counts >=3
  twos_freq$count %>%
    .[. >= 3] -> TWOS
  
  clusters_list <- list(channel_1 = ONES,
                        channel_2 = TWOS)
  
  return(clusters_list)
  
}

start <- Sys.time()

clusters_big_list <- lapply(list_of_matrices, find_clusters_2chan)

end <- Sys.time()

end - start

# run time = 3.902859 minutes

推荐答案

如果您不介意使用像{terra}这样的专门的栅格分析包,下面的应该是方便的and快速(edit: wrapper function to generalize across channel values and minimum cluster sizes at the bottom)

library(terra)

set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)
  • 将矩阵转换为栅格:
raster_1 <- matrix_1 |> rast()
  • 查找簇(将零视为NA,即停止)
the_clusters <- patches(raster_1, directions = 8, zeroAsNA=TRUE)
  • 判断确定的群集:
the_clusters |> plot()

clusters

  • 列出集群大小(值:集群ID,计数:集群大小)
the_clusters |> freq()
  layer value count
1     1     1    24
2     1     2     8
3     1     3     1
4     1     4     2
5     1     5    12

为任意数量的通道提取最小大小的簇的函数可能如下所示:

find_clusters_2_all_chans <- 
  function(M, channel_numbers, min_count) {
    channel_numbers |> 
      Map(f = \(i){
        M |> 
          rast() |>
          app(fun = \(cell) ifelse(cell == i, cell, NA)) |>
          patches(directions = 8, zeroAsNA = TRUE) |>
          freq()|>
          (\(m) m[m['count'] > min_count,])()
      }) |>
      setNames(paste0('channel_', channel_numbers))
  }

示例:为通道1和通道2拾取大于三个像素的簇

find_clusters_2_all_chans(M, 1:2, 3)
> $channel_1
  layer value count
1     1     1     4
2     1     2    10
3     1     3     6
4     1     5     6
6     1     7     4

$channel_2
  layer value count
5     1     5    12

R相关问答推荐

如何通过r中每20滚动和来创建组将数据视为1:10

仅在ggplot的每个方面绘制最丰富的物种

在R中,将一个函数作为输入传递给另一个函数时进行参数判断

是否可以 Select 安装不带文档的R包以更有效地存储?

判断字符串中数字的连续性

S用事件解决物质平衡问题

在组中添加值增加和减少的行

我如何才能找到FAMILY=POISSON(LINK=&Q;LOG&Q;)中的模型预测指定值的日期?

如何同时从多个列表中获取名字?

如何通过ggplot2添加短轴和删除长轴?

解析R函数中的变量时出现的问题

R中1到n_1,2到n_2,…,n到n_n的所有组合都是列表中的向量?

如何在PackageStatus()中列出&q;不可用的包&q;?

用R ggplot2求上、下三角形中两个变量的矩阵热图

在GG图中绘制射线的自动程序

更改STAT_VALLES/STAT_PEAKS中的箭头线宽/大小

计算使一组输入值最小化的a、b和c的值

如何为混合模型输出绘制不同的线型?

替换在以前工作的代码中有x行&q;错误(geom_sf/gganimate/dow_mark)

GOGPLATE geom_boxploy色彩疯狂