我有一个大约600,000行的数据集.由于使用了pandas iterrow(),以下代码需要很长时间才能运行.是否有适用于下面所示特定代码的替代方案
%%time
import numpy as np
df_inputed = df # dataframe with many missing values
for index, row in df_to_inpute.iterrows():
sic = row['sic']
year = row['year']
quarter = row['quarter']
for col in cols_to_check: #columns except for date and pk columns
value = row[col]
if np.isnan(value):
median = get_median(sic, year, quarter) #assume operation is O(1) time
if not np.isnan(median):
df_inputed.at[index, col] = median