如何根据R中的第二个稀疏矩阵修改大型稀疏矩阵的非零元素

发布于07月15日

我有两个大的稀疏矩阵(大小约为41000 x 55000).非零元素的密度约为10%.对于非零元素，它们都具有相同的行索引和列索引.

如果第二个矩阵中的值低于某个阈值，我现在想修改第一个稀疏矩阵中的值.

library(Matrix)

# Generating the example matrices.
set.seed(42)

# Rows with values.
i <- sample(1:41000, 227000000, replace = TRUE)
# Columns with values.
j <- sample(1:55000, 227000000, replace = TRUE)

# Values for the first matrix.
x1 <- runif(227000000)
# Values for the second matrix.
x2 <- sample(1:3, 227000000, replace = TRUE)

# Constructing the matrices.
m1 <- sparseMatrix(i = i, j = j, x = x1)
m2 <- sparseMatrix(i = i, j = j, x = x2)

现在，我从新矩阵中的第一个矩阵中获取行、列和值.这样，我可以简单地将它们子集，只保留我感兴趣的部分.

# Getting the positions and values from the matrices.
position_matrix_from_m1 <- rbind(i = m1@i, j = summary(m1)$j, x = m1@x)
position_matrix_from_m2 <- rbind(i = m2@i, j = summary(m2)$j, x = m2@x)

# Subsetting to get the elements of interest.
position_matrix_from_m1 <- position_matrix_from_m1[,position_matrix_from_m1[3,] > 0 & position_matrix_from_m1[3,] < 0.05]

# We add 1 to the values, since the sparse matrix is 0-based.
position_matrix_from_m1[1,] <- position_matrix_from_m1[1,] + 1
position_matrix_from_m1[2,] <- position_matrix_from_m1[2,] + 1

现在我有麻烦了.覆盖第二个矩阵中的值花费的时间太长.我让它运行了几个小时，但没有完成.

# This takes hours.
m2[position_matrix_from_m1[1,], position_matrix_from_m1[2,]] <- 1
m1[position_matrix_from_m1[1,], position_matrix_from_m1[2,]] <- 0

我考虑将行和列信息粘贴在一起.然后每个值都有一个唯一的标识符.这也需要很长时间，可能只是一种非常糟糕的做法.

# We would get the unique identifiers after the subsetting.
m1_identifiers <- paste0(position_matrix_from_m1[1,], "_", position_matrix_from_m1[2,])
m2_identifiers <- paste0(position_matrix_from_m2[1,], "_", position_matrix_from_m2[2,])

# Now, I could use which and get the position of the values I want to change.
# This also uses to much memory. 
m2_identifiers_of_interest <- which(m2_identifiers %in% m1_identifiers)

# Then I would modify the x values in the position_matrix_from_m2 matrix and overwrite m2@x in the sparse matrix object.

我的方法中是否存在根本错误？我应该怎么做才能有效地运行它？

# Getting (i, j, x) triplet (stored as a data.frame) for both `m1` and `m2` position_matrix_from_m1 <- summary(m1) # you never seem to use `position_matrix_from_m2` so I skip it # Subsetting to get the elements of interest. position_matrix_from_m1 <- subset(position_matrix_from_m1, x > 0 & x < 0.05)

library(Matrix) # Generating the example matrices. set.seed(42) ## reduce problem size to what my laptop can bear with squeeze <- 0.1 # Rows with values. i <- sample(1:(41000 * squeeze), 227000000 * squeeze ^ 2, replace = TRUE) # Columns with values. j <- sample(1:(55000 * squeeze), 227000000 * squeeze ^ 2, replace = TRUE) # Values for the first matrix. x1 <- runif(227000000 * squeeze ^ 2) # Values for the second matrix. x2 <- sample(1:3, 227000000 * squeeze ^ 2, replace = TRUE) # Constructing the matrices. m1 <- sparseMatrix(i = i, j = j, x = x1) m2 <- sparseMatrix(i = i, j = j, x = x2) ## give me more usable RAM rm(i, j, x1, x2)

## ## fix to your code ## m1a <- m1 m2a <- m2 # Getting (i, j, x) triplet (stored as a data.frame) for both `m1` and `m2` position_matrix_from_m1 <- summary(m1) # Subsetting to get the elements of interest. position_matrix_from_m1 <- subset(position_matrix_from_m1, x > 0 & x < 0.05) ij <- as.matrix(position_matrix_from_m1[, 1:2]) m2a[ij] <- 1 m1a[ij] <- 0 ## ## the best solution ## m1b <- m1 m2b <- m2 ind <- m1@x > 0 & m1@x < 0.05 m2b@x[ind] <- 1 m1b@x[ind] <- 0

如何根据R中的第二个稀疏矩阵修改大型稀疏矩阵的非零元素

推荐答案

R相关问答推荐

用dDeliverr用第二个表更新一个表

使用列表列作为case_when LHS的输入

将R data.frame转换为json数组(源代码)

从有序数据中随机抽样

管道末端运行功能

如何将在HW上运行的R中的消息(错误、警告等)作为批处理任务输出

如何使用R Shiny中的条件面板仅隐藏和显示用户输入，同时仍允许运行基础计算？

如何计算前一行的值，直到达到标准？

bslib：：card_header中的shine：：downloadButton，图标而不是文本

非线性混合效应模型(NLME)预测变量的置信区间

用约翰逊分布进行均值比较

如何识别倒排的行并在R中删除它们？

Geom_Hline将不会出现，而它以前出现了

使用RSelenium在R中抓取Reddit时捕获多个标签

将全局环境变量的名称分配给列表中的所有元素

如何使用前缀作为匹配来连接数据帧？

在R中使用列表(作为tibble列)进行向量化？

使用函数从R中的列中删除标高

排序R矩阵的行和列

附加中缀操作符