DATAFRAME
拥有一个最小的可重复示例总是很好,我将在这里提供它:
np.random.seed(42)
data = {
'brand': np.random.choice(['Brand A', 'Brand B', 'Brand C'], size=300),
'sales': np.random.randint(100, 1000, size=300),
'target': np.random.randint(100, 1000, size=300)
}
df = pd.DataFrame(data)
FUNCTION
对我来说,不清楚您是要返回单个回归的score
(即R2)还是coef
,在这两种情况下,函数只会略有变化:
评分
def lregression(group):
X = group['sales'].values.reshape(-1, 1)
y = group['target']
model = LinearRegression()
result = model.fit(X, y)
return result.score(X, y)
系数
def lregression(group):
X = group['sales'].values.reshape(-1, 1)
y = group['target']
model = LinearRegression()
result = model.fit(X, y)
return result.coef_
然后是最后一步(coef_
个场景):
>>> df.groupby('brand').apply(lregression)
brand
Brand A [0.20322970187699263]
Brand B [0.09134770152569331]
Brand C [0.043343302335992005]
dtype: object
其效果如预期