有必要用其他区域的中间值(等于或小于"总"值)替换空Nan值("面积"列). 例如: 第2行的值为"total"==8. Select 值为"total"==8的表,查找"area". median()的中位数并写下值(如果有). 如果没有值,则将"总数"减1并进一步搜索. 第6行中有值"total"==59.因此,让我们取"total"的中位数==56和"area"的值= 34.
数据应该是这样的:结果
import pandas as pd
import numpy as np
df = pd.DataFrame({'total': [5, 8, 8, 8, 20, 56, 59], \
'area': [40, 51, 53, np.nan, np.nan, 34, np.nan]})
df
# total area
0 5 40.0
1 8 51.0
2 8 53.0
3 8 NaN
4 20 NaN
5 56 34.0
6 59 NaN
result = pd.DataFrame({'total': [5, 8, 8, 8, 20, 56, 59], 'area': [40, 51, 53, 52, 52, 34, 34]})
result
# total area
0 5 40
1 8 51
2 8 53
3 8 52
4 20 52
5 56 34
6 59 34
我创建了一个函数,但它没有产生所需的结果:
def find_area(total_num, x=1):
while x > 0:
y = df.query('total == @total_num')['area'].sum()
if y > 0:
return df.query('total == @total_num')['area'].median()
x=0
break
else:
total_num -= 1
df['area'] = df['area'].fillna(find_area)
df