R 按组内中位数分类

发布于02月17日

让我们假设下面这样的数据

ID  Quantity    Group     Indicator
1   0.93        Red       1
2   0.17        Red       1
3   0.01        Red       0
4   0.44        Red       1
5   0.01        Red       0
6   0.86        Red       1
7   0.07        Red       1
8   0.02        Red       0   
9   1.00        Red       1
1   0.65        Blue      1
2   0.17        Blue      1
3   0.02        Blue      0
4   0.01        Blue      0
5   0.09        Blue      1
6   0.86        Blue      1
7   0.05        Blue      0
8   0.23        Blue      1
9   0.01        Blue      0

现在，我想创建一个具有三个值/类别的新列.

如果指标列值为0，则为1
3如果指标列值为1，且列数量中的值等于或高于中位数(指标==1的所有值的数量的中位数).
2如果指标列值为1，且列数量中的值等于或低于中位数(指标==1的所有值的数量的中位数).

此逻辑将分别应用于组中的值.

预计会有这样的输出.

红色组中所有数量值的中位数，其中指标==1. 中位数(0.93，0.17，0.44，0.86，0.07，1.00)=0.65

蓝色组中所有数量值的中位数，其中指标==1. 中位数(0.65，0.17，0.09，0.86，0.23)=0.23

ID  Quantity    Group     Indicator   Results
1   0.93        Red       1           3   <- Above the median(0.65) for all values in red where Indicator ==1
2   0.17        Red       1           2   <- Below the median(0.65) for all values in red where Indicator ==1
3   0.01        Red       0           1
4   0.44        Red       1           2   <- Below the median(0.65) for all values in red where Indicator ==1
5   0.01        Red       0           1
6   0.86        Red       1           3   <- Above the median(0.65) for all values in red where Indicator ==1
7   0.07        Red       1           2   <- Below the median(0.65) for all values in red where Indicator ==1
8   0.02        Red       0           1
9   1.00        Red       1           3   <- Above the median(0.65) for all values in red where Indicator ==1

1   0.65        Blue      1           3
2   0.17        Blue      1           2
3   0.02        Blue      0           1
4   0.01        Blue      0           1
5   0.09        Blue      1           2
6   0.86        Blue      1           3
7   0.05        Blue      0           1
8   0.23        Blue      1           2
9   0.01        Blue      0           1

我已经用很多If试过了，它非常笨拙.寻找使用Case_When的有效内容.先谢谢你.

library(dplyr, warn = FALSE) dat |> mutate( Results = case_when( Indicator == 0 ~ 1, Quantity <= median(Quantity) ~ 2, .default = 3 ), .by = c(Group, Indicator) ) #> ID Quantity Group Indicator Results #> 1 1 0.93 Red 1 3 #> 2 2 0.17 Red 1 2 #> 3 3 0.01 Red 0 1 #> 4 4 0.44 Red 1 2 #> 5 5 0.01 Red 0 1 #> 6 6 0.86 Red 1 3 #> 7 7 0.07 Red 1 2 #> 8 8 0.02 Red 0 1 #> 9 9 1.00 Red 1 3 #> 10 1 0.65 Blue 1 3 #> 11 2 0.17 Blue 1 2 #> 12 3 0.02 Blue 0 1 #> 13 4 0.01 Blue 0 1 #> 14 5 0.09 Blue 1 2 #> 15 6 0.86 Blue 1 3 #> 16 7 0.05 Blue 0 1 #> 17 8 0.23 Blue 1 2 #> 18 9 0.01 Blue 0 1

dat <- data.frame( ID = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 1, 2, 3, 4, 5, 6, 7, 8, 9), Quantity = c(0.93, 0.17, 0.01, 0.44, 0.01, 0.86, 0.07, 0.02, 1.00, 0.65, 0.17, 0.02, 0.01, 0.09, 0.86, 0.05, 0.23, 0.01), Group = c("Red", "Red", "Red", "Red", "Red", "Red", "Red", "Red", "Red", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue", "Blue"), Indicator = c(1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0) )

R 按组内中位数分类

推荐答案

R相关问答推荐

将带有范围的字符串转换为R中的数字载体

变量计算按R中的行更改

Tidyverse/Djirr为从嵌套列表中提取的列名赋值的解决方案

在R中查找每个组不同时间段的总天数

多重RHS固定估计

如何在xyplot中 for each 面板打印R^2

如何在格子中添加双曲曲线

使用外部文件分配变量名及其值

如何从像glm这样的模型中提取系数表的相关性？

如何在分组条形图中移动相关列？

使用范围和单个数字将数字与字符串进行比较

Geom_Hline将不会出现，而它以前出现了

`夹心：：vcovCL`不等于`AER：：tobit`标准错误

从数据创建数字的命名列表.R中的框

对R中的列表列执行ROW Mean操作

如何在AER：：ivreg中指定仪器？

有毒元素与表观遗传年龄的回归模型

如何将图例文本添加到图例符号中

为什么R列名称忽略具有指定名称的向量，而只关注索引？

向内存不足的数据帧添加唯一行