我有像下面这样的数据框,想要根据除了日期之外的所有列找到重复的数据,使用排除的列(日期)来确定要删除哪些行(只保留最近的日期).在不丢失列的情况下完成所有这些操作.
ID Fn Ln date
1 1 Joe Schmoe 2001-01-01
2 1 Joe Schmoe 2010-01-01
3 6 Joe Schmoe 2001-01-01
4 2 Stacy Fakename 2002-02-02
5 2 Stacy Fakename 2020-02-02
6 3 Craig Collins 2030-03-03
7 3 Craig Collins 2003-03-03
8 4 Leo Fern 2040-04-04
9 4 Leo Fern 2004-04-04
10 5 Penny Diamond 2005-05-05
11 5 Penny Diamond 2050-05-05
因此,三行Joe Schmoe的代码应该会发现只有两行是相同的.一个是因为身份证是不同的,其余两个是相同的,除了日期,其中2010年一个应该保留.
我最终希望将像Joe ID 6和最近的副本(Joe ID 1日期为2010)这样的唯一项保留在同一个表中,删除旧的副本(Joe ID 1日期为2001).
数据
数据 <- 数据.frame(ID=c(1, 1, 6, 2, 2, 3, 3, 4, 4, 5, 5),
Fn=c("Joe", "Joe", "Joe", "Stacy", "Stacy", "Craig", "Craig", "Leo", "Leo", "Penny", "Penny"),
Ln=c("Schmoe", "Schmoe", "Schmoe", "Fakename", "Fakename", "Collins", "Collins", "Fern", "Fern", "Diamond", "Diamond"),
date=c("2001-01-01", "2010-01-01", "2001-01-01", "2002-02-02", "2020-02-02", "2030-03-03", "2003-03-03", "2040-04-04", "2004-04-04", "2005-05-05", "2050-05-05")
)