我有一个这样的数据集:
df0 = (pd.DataFrame({'year_minor_renovation': ['2023', '2025', np.nan, '2026'],
'year_intermediate_renovation': [np.nan, '2025', '2027', '2030'],
'year_major_renovation': ['2030', np.nan, np.nan, np.nan],
'costs_minor_renovation': [1000, 3000, np.nan, 2000],
'costs_intermediate_renovation': [np.nan, 5000, 5000, 10000],
'costs_major_renovation': [75000, np.nan, np.nan, np.nan]}))
year_minor_renovation | year_intermediate_renovation | year_major_renovation | costs_minor_renovation | costs_intermediate_renovation | costs_major_renovation | |
---|---|---|---|---|---|---|
0 | 2023 | NaN | 2030 | 1000.0 | NaN | 75000.0 |
1 | 2025 | 2025 | NaN | 3000.0 | 5000.0 | NaN |
2 | NaN | 2027 | NaN | NaN | 5000.0 | NaN |
3 | 2026 | 2030 | NaN | 2000.0 | 10000.0 | NaN |
每一条线代表一座要翻新的建筑.可以将其视为具有相同索引的两个串联子集:
- 2023年至2030年期间需要对特定建筑进行一次或多次翻新时,左半部分为
df.iloc[:, :3]
(索引) - 右半部分是对应的成本
我想要的是
一些建筑在不同的年份需要不同的翻新类型(例如:df.iloc[[1]]
).
我需要收集新的柱子,每年一根,与每座建筑的成本无关,无论翻修的类型是什么.
(pd.DataFrame({'2023': [1000, np.nan, np.nan, np.nan],
'2024': [np.nan, np.nan, np.nan, np.nan],
'2025': [np.nan, 8000, np.nan, np.nan],
'2026': [np.nan, np.nan, np.nan, 2000],
'2027': [np.nan, np.nan, 5000, np.nan],
'2028': [np.nan, np.nan, np.nan, np.nan],
'2029': [np.nan, np.nan, np.nan, np.nan],
'2030': [75000, np.nan, 5000, 10000]}))
2023 | 2024 | 2025 | 2026 | 2027 | 2028 | 2029 | 2030 | |
---|---|---|---|---|---|---|---|---|
0 | 1000.0 | NaN | NaN | NaN | NaN | NaN | NaN | 75000.0 |
1 | NaN | NaN | 8000.0 | NaN | NaN | NaN | NaN | NaN |
2 | NaN | NaN | NaN | NaN | 5000.0 | NaN | NaN | 5000.0 |
3 | NaN | NaN | NaN | 2000.0 | NaN | NaN | NaN | 10000.0 |
我所try 的
我试图编写一个GROUPBY函数来创建这些新列,但即使结果提供了一些我稍后需要的数据,对于我当时想要的东西来说,这也是一种过度的合成:
def costs_per_year(df):
dfs = []
for i in ['year_minor_renovation',
'year_intermediate_renovation',
'year_major_renovation']:
j = 'costs' + str(i[4:])
df_ = (df.groupby(i)
.agg({j : 'sum' })
.reset_index()
.rename({i:'year'}, axis =1)
)
dfs.append(df_)
# merge the dataframes
merged_df = dfs[0]
for df_ in dfs[1:]:
merged_df = merged_df.merge(df_, on='year', how='outer')
merged_df = (merged_df
.set_index('year')
.transpose()
.reset_index()
)
return merged_df
year | index | 2023 | 2025 | 2026 | 2027 | 2030 |
---|---|---|---|---|---|---|
0 | costs_minor_renovation | 1000.0 | 3000.0 | 2000.0 | NaN | NaN |
1 | costs_intermediate_renovation | NaN | 5000.0 | NaN | 5000.0 | 10000.0 |
2 | costs_major_renovation | NaN | NaN | NaN | NaN | 750000.0 |