我得到了以下氨纶:
index user default_shipping_cost category shipping_cost shipping_coalesce estimated_shipping_cost
0 0 1 1 clothes NaN 1.0 6.0
1 1 1 1 electronics 2.0 2.0 6.0
2 2 1 15 furniture NaN 15.0 6.0
3 3 2 15 furniture NaN 15.0 15.0
4 4 2 15 furniture NaN 15.0 15.0
每个用户,将shipping_cost与默认_shipping_cost结合起来,并计算合并shipping_costs的平均值,但前提是至少有一个给出的shipping_costs.
解释:
- user_1给出了
shipping_cost
(至少一次),以便我们可以计算平均值 - user_2没有
shipping_cost
,所以我想和Nan一起go
代码 :
import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
pd.set_option('display.width', 1000)
df = pd.DataFrame(
{
'user': [1, 1, 1, 2, 2],
'default_shipping_cost': [1, 1, 15, 15, 15],
'category': ['clothes', 'electronics', 'furniture', 'furniture', 'furniture'],
'shipping_cost': [None, 2, None, None, None]
}
)
df.reset_index(inplace=True)
df['shipping_coalesce'] = df.shipping_cost.combine_first(df.default_shipping_cost)
dfg_user = df.groupby(['user'])
df['estimated_shipping_cost'] = dfg_user['shipping_coalesce'].transform("mean")
print(df)
预期yields :
index user default_shipping_cost category shipping_cost shipping_coalesce estimated_shipping_cost
0 0 1 1 clothes NaN 1.0 6.0
1 1 1 1 electronics 2.0 2.0 6.0
2 2 1 15 furniture NaN 15.0 6.0
3 3 2 15 furniture NaN 15.0 NaN
4 4 2 15 furniture NaN 15.0 NaN