我有这个Pandas 数据框:
df = pd.DataFrame(
{"CALDT": ["1980-01-31", "1980-02-28", "1980-03-31",
"1980-01-31", "1980-02-28", "1980-03-31",
"1980-01-31"],
"ID": [1, 1, 1,
2, 2, 2,
3],
"Return": [0.02, 0.05, 0.10,
0.05, -0.02, 0.03,
-0.03]
})
df['Year'] = pd.to_datetime(df['CALDT']).dt.year
我的目标是:如果ID还活着&>=2个月,对于每个ID和年份(分组),基于Return
,计算平均值和中位数乘以12,并将该值分配回该行.
预期输出应如下所示:
df_new = pd.DataFrame(
{"CALDT": ["1980-01-31", "1980-02-28", "1980-03-31",
"1980-01-31", "1980-02-28", "1980-03-31",
"1980-01-31"],
"Year": [1980, 1980, 1980,
1980, 1980, 1980,
1980],
"ID": [1, 1, 1,
2, 2, 2,
3],
"Return": [0.02, 0.05, 0.10,
0.05, -0.02, 0.03,
0.03],
"Mean_Return": [0.68, 0.68, 0.68,
0.24, 0.24, 0.24,
np.nan],
"Median_Return": [0.60, 0.60, 0.60,
0.36, 0.36, 0.36,
np.nan]
})
在R中,使用tidyverse
中的group_by
就很容易做到这一点:
df = df %>%
mutate(Year = year(CALDT)) %>%
group_by(CRSP_FUNDNO, Year) %>%
mutate(months_alive = length(unique(CALDT))) %>%
mutate(mean = case_when(months_alive >= 2 ~ mean(Return)*3,
.default = NA)) %>%
mutate(median = case_when(months_alive >= 2 ~ mean(Return)*12,
.default = NA))
如有任何帮助,我们将不胜感激!