我有一个数据帧:
import pandas as pd
d = {
'Country': ["Austria", "Austria", "Belgium", "USA", "USA", "USA", "USA"],
'Number2020': [15, None, 18, 20, 22, None, 30],
'Number2021': [20, 25, 18, None, None, None, 32],
}
df = pd.DataFrame(data=d)
df
Country Number2020 Number2021
0 Austria 15.0 20.0
1 Austria NaN 25.0
2 Belgium 18.0 18.0
3 USA 20.0 NaN
4 USA 22.0 NaN
5 USA NaN NaN
6 USA 30.0 32.0
我想总结每个国家的nan值.例如.
Country Count_nans
Austria 1
USA 4
我已经过滤了数据帧,只留下带有NaN的行.
df_nan = df[df.Number2021.isna() | df.Number2020.isna()]
Country Number2020 Number2021
1 Austria NaN 25.0
3 USA 20.0 NaN
4 USA 22.0 NaN
5 USA NaN NaN
看起来像是一个分组操作?我试过这个.
nasum2021 = df_nan['Number2021'].isna().sum()
df_nan['countNames2021'] = df_nan.groupby(['Number2021'])['Number2021'].transform('count').fillna(nasum2021)
df_nan
它给了我1个nan代表奥地利,但给了3个代表美国,而它应该是4.所以这是不对的.