classA = Dataset(id = ["id1", "id2", "id3", "id4", "id5"],
                        mark = [50, 69.5, 45.5, 88.0, 98.5]);

grades = Dataset(mark = [0, 49.5, 59.5, 69.5, 79.5, 89.5, 95.5],
                        grade = ["F", "P", "C", "B", "A-", "A", "A+"]);

我们可以使用InMemoyDataSets包来进行闭合连接.

我们如何在DataFrames包中实现此方法.

closejoin(classA, grades, on = :mark)
closejoin(classA, grades, on = :mark, direction=:forward, border=:nearest)

在R中如何做到这一点呢?

推荐答案

R年,用findInterval可以做到这一点.

classA = data.frame(id = c("id1", "id2", "id3", "id4", "id5"),
                        mark = c(50, 69.5, 45.5, 88.0, 98.5))

grades = data.frame(mark = c(0, 49.5, 59.5, 69.5, 79.5, 89.5, 95.5),
                 grade = c("F", "P", "C", "B", "A-", "A", "A+"))

cbind(classA, grade = grades$grade[findInterval(classA$mark, grades$mark)])
#   id mark grade
#1 id1 50.0     P
#2 id2 69.5     B
#3 id3 45.5     F
#4 id4 88.0    A-
#5 id5 98.5    A+

cbind(classA, grade = grades$grade[findInterval(classA$mark, c(-Inf, grades$mark), all.inside = TRUE, left.open = TRUE)])
  id mark grade
#1 id1 50.0     C
#2 id2 69.5     B
#3 id3 45.5     P
#4 id4 88.0     A
#5 id5 98.5    A+

在Julia中,你可以用searchsortedlastsearchsortedfirst.

using DataFrames

classA = DataFrame(id = ["id1", "id2", "id3", "id4", "id5"],
                   mark = [50, 69.5, 45.5, 88.0, 98.5]);
grades = DataFrame(mark = [0, 49.5, 59.5, 69.5, 79.5, 89.5, 95.5],
                   grade = ["F", "P", "C", "B", "A-", "A", "A+"]);

classA[!, "Grade"] = grades.grade[[searchsortedlast(grades.mark, x) for x in classA.mark]]
classA
#5×3 DataFrame
# Row │ id      mark     Grade  
#     │ String  Float64  String 
#─────┼─────────────────────────
#   1 │ id1        50.0  P
#   2 │ id2        69.5  B
#   3 │ id3        45.5  F
#   4 │ id4        88.0  A-
#   5 │ id5        98.5  A+

classA[!, "Grade"] =  grades.grade[min.(length(grades.grade), [searchsortedfirst(grades.mark, x) for x in classA.mark])]
classA
#5×3 DataFrame
# Row │ id      mark     Grade  
#     │ String  Float64  String 
#─────┼─────────────────────────
#   1 │ id1        50.0  C
#   2 │ id2        69.5  B
#   3 │ id3        45.5  P
#   4 │ id4        88.0  A
#   5 │ id5        98.5  A+

与问题中给出的Julia人中的InMemoryDatasets人相同,包括比较结果.

using InMemoryDatasets

classA = Dataset(id = ["id1", "id2", "id3", "id4", "id5"],
                        mark = [50, 69.5, 45.5, 88.0, 98.5]);

grades = Dataset(mark = [0, 49.5, 59.5, 69.5, 79.5, 89.5, 95.5],
                 grade = ["F", "P", "C", "B", "A-", "A", "A+"]);

closejoin(classA, grades, on = :mark)
#5×3 Dataset
# Row │ id        mark      grade    
#     │ identity  identity  identity 
#     │ String?   Float64?  String?  
#─────┼──────────────────────────────
#   1 │ id1           50.0  P
#   2 │ id2           69.5  B
#   3 │ id3           45.5  F
#   4 │ id4           88.0  A-
#   5 │ id5           98.5  A+

closejoin(classA, grades, on = :mark, direction=:forward, border=:nearest)
#5×3 Dataset
# Row │ id        mark      grade    
#     │ identity  identity  identity 
#     │ String?   Float64?  String?  
#─────┼──────────────────────────────
#   1 │ id1           50.0  C
#   2 │ id2           69.5  B
#   3 │ id3           45.5  P
#   4 │ id4           88.0  A
#   5 │ id5           98.5  A+

R相关问答推荐

在值和NA的行顺序中寻找中断模式

如何使用shinyChatR包配置聊天机器人

R:更新后无法运行控制台

如何计算前一行的值,直到达到标准?

通过使用str_detect对具有相似字符串的组进行分组

基于多列将值链接到NA

我不能在docker中加载sf

如何在格子中添加双曲曲线

如何改变时间图R中的悬停信息?

当月份额减go 当月份额

在R中,如何将变量(A,B和C)拟合在同一列中,如A和B,以及A和C在同一面板中?

如何通过匹配R中所有可能的组合来从宽到长旋转多个列?

QY数据的处理:如何定义QY因素的水平

安全地测试文件是否通过R打开

Geom_arcbar()中出错:找不到函数";geom_arcbar";

计算来自单独分组的分幅的值的百分位数

如何在R中的两列以上使用联合(&U)?

将日期列从字符转换为日期得到的结果是NAS

使用点图调整离散轴比例

如何在GGPlot中控制多个图例和线型