我拿到数据帧了.
data={"ID":[1,1,1,1,1,1,1,1,1,2,2,2],
"Year":[2000,2001,2002,2003,2004,1997,1998,2003,2004,1997,1998,2005],
"Firm":["A","A","B","B","A","A","A","A","B","B","A","A"],
"Count":[0,1,0,0,0,0,0,0,0,0,0,0]}
df1=pd.DataFrame(data)
预期的输出是这样的.
data={"ID":[1,1,1,1,1,1,1,1,1,2,2,2],
"Year":[2000,2001,2002,2003,2004,1997,1998,2003,2004,1997,1998,2005],
"Firm":["A","A","B","B","A","A","A","A","B","B","A","A"],
"Count":[0,1,0,0,0,0,0,0,0,0,0,0],
"Count_1":[0,1,1,1,1,0,0,1,1,0,0,0]}
df2=pd.DataFrame(data)
我可以通过我的代码实现预期的输出.
df_1=df1.sort_values(by=["ID","Year"],ascending=True)
df_1["Count_1"]=np.where(df_1["Count"]==1,1,np.NaN)
df_1["Count_1"]=df_1.groupby(["ID"],as_index=None)["Count_1"].ffill()
df_1.drop(columns=["Count"],inplace=True)
df_1.fillna(0)
但是,我正在寻找一个更短,更干净的代码.