我有一个类似以下内容的数据集:
name = c("john", "john", "john", "alex","alex", "tim", "tim", "tim", "ralph", "ralph")
year = c(2010, 2011, 2012, 2011, 2012, 2010, 2011, 2012, 2014, 2016)
my_data = data.frame(name, year)
name year
1 john 2010
2 john 2011
3 john 2012
4 alex 2011
5 alex 2012
6 tim 2010
7 tim 2011
8 tim 2012
9 ralph 2014
10 ralph 2016
我想要计算此数据集中的以下两项内容:
-
- Groups based on all years个 ==同步,由长者更正==
-
- And of these groups, the number of groups with at least one non-consecutive year个 ==同步,由长者更正==
以1)为例:
# sample output for 1)
year count
1 2010, 2011, 2012 2
2 2011, 2012 1
3 2014, 2016 1
作为2)的一个例子,只有行3(在上述数据框中)包含丢失的年份(即,没有2015年的2014至2016年).因此,输出将如下所示:
# sample output for 2)
year count
1 2014, 2016 1
有没有人能教我怎么用R做这个?有没有办法确保(2011,2012)与(2012,2011)被认为是一样的?
编辑:对于任何使用旧版本R的人,@Rui Barradas提供了2)的答案-我将其包含在这里,以便在复制/粘贴时不会有歧义:
agg <- aggregate(year ~ name, my_data, c)
agg <- agg$year[sapply(agg$year, function(y) any(diff(y) != 1))]
as.data.frame(table(sapply(agg, paste, collapse = ", ")))