在R编程语言中给出以下矩阵:
set.seed(123)
matrix_1 <- matrix(rbinom(100, 1, 0.5), nrow = 10, ncol = 10)
下面是一个深度优先搜索(DFS)算法,它识别该矩阵中的1的簇.在这种情况下,"簇"是整数在矩阵上的连续映射,其最小簇大小为3,并且假设8连通性(即,包括对角线).注意:我try 在EBImage
包中使用基于图像的方法,但它的执行速度太慢了.我有数千个EBImage
xEBImage
矩阵要分析!
find_clusters <- function(matrix) {
rows <- nrow(matrix)
cols <- ncol(matrix)
# Create a matrix of the same size to mark visited cells
visited <- matrix(0, nrow = rows, ncol = cols)
# Define all 8 possible movements from a cell (8-connectivity)
row_nbr <- c(-1, -1, -1, 0, 0, 1, 1, 1)
col_nbr <- c(-1, 0, 1, -1, 1, -1, 0, 1)
# A function to check if a cell can be included in the DFS
is_valid <- function(row, col) {
row >= 1 && row <= rows && col >= 1 && col <= cols &&
visited[row, col] == 0 && matrix[row, col] == 1
}
# A function to do a DFS of a 2D boolean matrix. It only considers
# the 8 cells directly connected to a cell
DFS <- function(matrix, row, col, visited, cluster) {
row_stack <- c(row)
col_stack <- c(col)
while (length(row_stack) > 0) {
r <- row_stack[length(row_stack)]
c <- col_stack[length(col_stack)]
row_stack <- row_stack[-length(row_stack)]
col_stack <- col_stack[-length(col_stack)]
if (visited[r, c] == 0) {
visited[r, c] <- 1
cluster <- rbind(cluster, c(r, c))
for (k in 1:8) {
if (is_valid(r + row_nbr[k], c + col_nbr[k])) {
row_stack <- c(row_stack, r + row_nbr[k])
col_stack <- c(col_stack, c + col_nbr[k])
}
}
}
}
return(cluster)
}
# The main function that returns all clusters
get_clusters <- function(matrix, visited) {
clusters <- list()
for (i in 1:rows) {
for (j in 1:cols) {
if (visited[i, j] == 0 && matrix[i, j] == 1) {
new_cluster <- DFS(matrix, i, j, visited, matrix(, nrow = 0, ncol = 2))
if (nrow(new_cluster) >= 3) {
clusters[[length(clusters) + 1]] <- new_cluster
}
}
}
}
return(clusters)
}
return(get_clusters(matrix, visited))
}
效果很好,而且速度很快.但是,此函数返回大小为>;3的所有可能的集群(共44个),其中包括嵌套在较大集群中的较小集群.
以二进制图像表示的矩阵:
my_palette <- c("white", "black")
# correct for how image() reads a matrix
rotate <- function(x) t(apply(x, 2, rev))
image(rotate(matrix_1),
axes = FALSE,
col = my_palette_2)
我只看到了三个大小为>;=3的簇.我如何修改我的函数,使其只"看到"矩阵上最大的未中断的簇?
更新
谢谢你@I_O!我有10000个100x100矩阵来自matlab的模拟,模拟细胞膜上钠ionic 通道的行为.以下函数执行您的建议,并返回通道类型1和2的簇大小:
library(dplyr)
library(terra)
# M: matrix of integers
find_clusters_2chan <- function(M) {
# Consider only 1s
ones <- M == 1
# convert matrix to raster
raster_ones <- ones |> rast()
# find clusters (consider zeros as NA, i. e. discontinuation)
clusters_ones <- patches(raster_ones,
directions = 8,
zeroAsNA = TRUE)
# generate frequency table
ones_freq <- the_clusters_ones |> freq()
# return counts >=3
ones_freq$count %>%
.[. >= 3] -> ONES
#-------------------------------------------------------------------------------
# Consider only 2s
twos <- M == 2
# convert matrix to raster
raster_twos <- twos |> rast()
# find clusters (consider zeros as NA, i. e. discontinuation)
clusters_twos <- patches(raster_twos,
directions = 8,
zeroAsNA = TRUE)
# generate frequency table
twos_freq <- clusters_twos |> freq()
# return counts >=3
twos_freq$count %>%
.[. >= 3] -> TWOS
clusters_list <- list(channel_1 = ONES,
channel_2 = TWOS)
return(clusters_list)
}
start <- Sys.time()
clusters_big_list <- lapply(list_of_matrices, find_clusters_2chan)
end <- Sys.time()
end - start
# run time = 3.902859 minutes