我正在使用R编程语言.

假设我有以下数据框:

var_1 = var_2 = var_3 = var_4 = var_5 =  c("1,2,3,4,5,6,7,8,9,10")

my_data = data.frame(var_1,var_2,var_3,var_4,var_5)

my_data = rbind(my_data, my_data[rep(1, 100), ])

rownames(my_data) = 1:nrow(my_data)

数据如下所示:

    head(my_data)

                 var_1                var_2                var_3                var_4                var_5
1 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
2 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
3 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
4 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
5 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
6 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10

My Question:我想用0随机替换这个数据框中的元素——例如,最终结果应该是这样的(为简洁起见,我只显示第一行):

# desired result

                 var_1                var_2                var_3                var_4                var_5
1 1,0,3,0,5,6,0,0,9,10 1,2,0,4,5,0,0,8,9,0 1,0,3,0,0,0,0,8,9,0 1,2,3,4,0,6,7,0,0,10 1,2,0,4,5,0,7,8,0,10

我try 使用以下代码行(Replace random values in a column in a dataframe)来实现这一点:

my_data$var_1[sample(nrow(my_data),as.integer(0.5*nrow(my_data)) , replace = TRUE)] <- 0
my_data$var_2[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0
my_data$var_3[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0
my_data$var_4[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0
my_data$var_5[sample(nrow(my_data),as.integer(0.5*nrow(my_data)), replace = TRUE)] <- 0

但这是将一行中的所有元素替换为0(而不是仅替换一行中的一些元素):

head(my_data)
                 var_1                var_2                var_3                var_4                var_5
1 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0                    0                    0
2                    0                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0
3                    0 1,2,3,4,5,6,7,8,9,10                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
4                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0 1,2,3,4,5,6,7,8,9,10
5 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10
6 1,2,3,4,5,6,7,8,9,10                    0 1,2,3,4,5,6,7,8,9,10 1,2,3,4,5,6,7,8,9,10                    0

Can someone please show me what I am doing wrong and how to get the desired result?

谢谢

推荐答案

下面是一个版本,它允许您使用Map分别指定每列中pnul变为0的概率向量.将被拆分字符串的length乘以pnul的元素,将sample的数量设置为零.您还可以将pnul设置为标量,以在所有列中获得相同的概率.

pnul <- c(.0, .2, .5, .8, 1)

res <- Map(\(x, a) {
  S <- strsplit(x, ',')
  sapply(S, \(s) {
    s[sample(seq_along(s), length(s)*a)] <- '0'
    paste(s, collapse=',')
  })
}, my_data, pnul) |> as.data.frame()

head(res)
#                  var_1                var_2                var_3                var_4               var_5
# 1 1,2,3,4,5,6,7,8,9,10 0,0,3,4,5,6,7,8,9,10  1,2,0,4,0,0,7,8,0,0  0,0,0,0,0,0,0,8,9,0 0,0,0,0,0,0,0,0,0,0
# 2 1,2,3,4,5,6,7,8,9,10  1,0,3,4,5,6,7,8,9,0 1,0,3,0,5,0,0,0,9,10  0,0,0,0,0,0,7,8,0,0 0,0,0,0,0,0,0,0,0,0
# 3 1,2,3,4,5,6,7,8,9,10 1,0,0,4,5,6,7,8,9,10 1,0,0,0,0,6,7,0,9,10 0,0,0,0,5,0,0,0,0,10 0,0,0,0,0,0,0,0,0,0
# 4 1,2,3,4,5,6,7,8,9,10 1,2,3,0,5,6,7,0,9,10 0,0,3,0,5,0,7,0,9,10  0,0,0,4,0,0,7,0,0,0 0,0,0,0,0,0,0,0,0,0
# 5 1,2,3,4,5,6,7,8,9,10  1,0,3,4,5,6,7,8,9,0 0,2,0,4,5,0,7,0,0,10  1,0,0,0,0,0,0,8,0,0 0,0,0,0,0,0,0,0,0,0
# 6 1,2,3,4,5,6,7,8,9,10 0,2,3,4,5,6,0,8,9,10  1,2,3,0,5,0,7,0,0,0  0,0,0,4,5,0,0,0,0,0 0,0,0,0,0,0,0,0,0,0

R相关问答推荐

根据收件箱中的特定值提取列名

从开始时间和结束时间导出时间

多重RHS固定估计

pickerInput用于显示一条或多条geom_hline,这些线在图中具有不同 colored颜色

通过使用str_detect对具有相似字符串的组进行分组

derrr mutate case_when grepl不能在R中正确返回值

如何在R中合并两个基准点?

在另存为PNG之前隐藏htmlwidget绘图元素

将文件保存到新文件夹时,切换r设置以不必创建目录

R中的哈密顿滤波

解析R函数中的变量时出现的问题

汇总数据的Sheffe检验的P值(平均值和标准差)

将向量元素重新排序为R中的第二个

在数据帧列表上绘制GGPUP

将项粘贴到向量中,并将它们分组为x的倍数,用空格分隔

警告消息";没有非缺失的参数到min;,正在返回数据中的inf";.表分组集

R中时间间隔的大向量与参考时间间隔的相交

希望解析和复制R中特定模式的数据

Package emMeans:如果emmip模型中包含的变量较少,emMeans模型中的其他变量设置为什么?

用LOOCV进行K近邻问题