我无法想象以前没有人问过这个问题,但我花了2个小时寻找,什么也没有找到.
假设我有5个独立的数据框,其中包含不同年份的相同四个变量.每个数据框中都有一个名为‘ID’的公共变量.所有数据框都已经是长格式(因此观测结果列在彼此下面),以供进一步分析. 每个数据帧由不同数量的观测值组成,因此并非所有ID都包含在每个数据帧中.其目的是将这些数据框合并为面板数据.因此,只有完整的 case 才应被考虑.如果一个ID丢失了一年,它应该被丢弃.
为了更好地理解,下面是一些示例代码:
#2010
df1=data.frame(ID=c(111,112,113,114),"year"=c(2010, 2010, 2010, 2010),"income"=c(3800, 2200, 1500, 2700),"state"=c("NI", "SH", "BY", "NI"))
df1
ID year income state
1 111 2010 3800 NI
2 112 2010 2200 SH
3 113 2010 1500 BY
4 114 2010 2700 NI
#2011
df2=data.frame(ID=c(112,113,114,115,116),"year"=c(2011, 2011, 2011, 2011, 2011),"income"=c(2300,1500,2500,4200,6000),"state"=c("BY", "BY", "SH", "BY", "HH"))
df2
ID year income state
1 112 2011 2300 BY
2 113 2011 1500 BY
3 114 2011 2500 SH
4 115 2011 4200 BY
5 116 2011 6000 HH
#2012
df3=data.frame(ID=c(109,112,113,114),"year"=c(2012,2012,2012,2012),"income"=c(1200,2500,1500,3000),"state"=c("BW", "BY", "NI", "SH"))
df3
ID year income state
1 109 2012 1200 BW
2 112 2012 2500 BY
3 113 2012 1500 NI
4 114 2012 3000 SH
#Desired result
df_final=data.frame(ID=c(112,112,112,113,113,113,114,114,114),"year"=c(2010,2011,2012,2010,2011,2012,2010,2011,2012),"income"=c(2200,2300,2500,1500,1500,1500,2700,2500,300),"state"=c("SH", "BY", "BY", "BY", "BY", "NI", "NI", "SH", "SH"))
df_final
ID year income state
1 112 2010 2200 SH
2 112 2011 2300 BY
3 112 2012 2500 BY
4 113 2010 1500 BY
5 113 2011 1500 BY
6 113 2012 1500 NI
7 114 2010 2700 NI
8 114 2011 2500 SH
9 114 2012 300 SH
我发现了一个类似的问题,在哪里推荐来自reshape2包的面板数据函数.尽管它工作得很好,但不幸的是,它并不排除 case .
有谁知道解决办法吗? 我很感激任何形式的帮助.
提前谢谢!