我有一个带有个人id和两个特征("x"e"y")的数据帧,如下所示:

id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x = c(10,4,6,8,9,8,7,6,12,14,11,9,8,4,5,10,14,12,15,7,10,14,24,28)
y = c(1.5,1.2,5,2,0.8,4,1,1.1,1.2,1.4,1.3,1.6,0.9,0.8,1,1.1,1.3,1.5,1.2,1.1,1,1.2,1.1,1)
a = data.frame(id,x,y)

我希望有一个循环来迭代每个特征和每个个体,这样我就可以创建一个新的数据框(或a的新列),其中个体将有1(如果它是异常值),0如果不是异常值.将离群值视为偏离性状平均值±3 SD的任何点.

在本例中,"x"的异常值为28,"y"的异常值为5.因此,所需的结果可能如下所示:

id = c("A1","A2","A3","A4","A5","A6","A7","A8","A9","A10","A11","A12","A13","A14","A15","A16","A17","A18","A19","A20","A21","A22","A23","A24")
x_out = c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1)
y_out = c(0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
a_out = data.frame(id, x_out, y_out)

你知道怎么做循环吗?这个 idea 是,如果我包括新的特征或个人,我不需要改变循环.谢谢!

推荐答案

不需要循环,只需一次测试所有列的绝对z-Score(abs(scale()))是否为>= 3:

a_out <- a
a_out[, -1] <- as.integer(abs(scale(a[, -1])) >= 3)
#> a_out
    id x y
1   A1 0 0
2   A2 0 0
3   A3 0 1
4   A4 0 0
5   A5 0 0
6   A6 0 0
7   A7 0 0
8   A8 0 0
9   A9 0 0
10 A10 0 0
11 A11 0 0
12 A12 0 0
13 A13 0 0
14 A14 0 0
15 A15 0 0
16 A16 0 0
17 A17 0 0
18 A18 0 0
19 A19 0 0
20 A20 0 0
21 A21 0 0
22 A22 0 0
23 A23 0 0
24 A24 1 0

或使用dplyr:

library(dplyr)

a_out <- a %>% 
  mutate(across(!id, \(x) as.integer(abs(scale(x)) >= 3)))
# same output as above

R相关问答推荐

for循环和if else在R中

是否可以通过另一个DF的内容过滤数据帧列表?

使用split.zoo界定xts物体的降水事件

如何在x轴下方画一条带有箭头的线?

将虚线添加到每个站点的传奇中平均

从载体创建 pyramid

高质量地将R格式的图表从Word中输出

ggplot 2中的地块底图(basemaps_gglayer()不起作用)

次级y轴R gggplot2

您是否可以使用facet_rap设置一个较低的限制来对ggmap上的比例中断进行zoom ?

来自程序包AFEX和amp;的类/函数和NICE_TABLE&冲突

正则表达式在第二个管道和第二个T之后拆分R中的列

如何使用字符串从重复的模式中提取多个数字?

删除数据帧中特定行号之间的每第三行和第四行

在ggplot2上从多个数据框创建复杂的自定义图形

如何提取R中其他字符串和数字之间的字符串?

创建在文本字符串中发现两个不同关键字的实例的数据框

如何在使用Alpha时让geom_curve在箭头中显示恒定透明度

用满足特定列匹配的另一行替换NA行

按镜像列值自定义行顺序