你可以用glm
代替fit_constrained
.
import random
import pandas as pd
import statsmodels.formula.api as smf
df = pd.DataFrame(
{
"y" : [x ** 2 + random.gauss(2, 1) for x in range(10)],
"x" : [x for x in range(10)],
}
)
untrained_glm = smf.glm("y ~ x + I(x ** 2) + I(x ** 3)", df)
trained_glm = untrained_glm.fit_constrained(
([[1, 0, 0, 0], [1, 8, 64, 512]], [8, 60])
)
df["pred"] = trained_glm.predict(df)
结果:
>>> df
y x pred
0 0.191139 0 8.000000
1 3.225092 1 6.110541
2 5.353590 2 7.008272
3 9.367904 3 10.498092
4 16.512384 4 16.384900
5 28.742154 5 24.473595
6 36.584476 6 34.569078
7 51.006869 7 46.476246
8 66.839006 8 60.000000
9 82.163031 9 74.945239
(编辑以添加对约束的解释)
假设模型为y = a + b * x + c * (x ** 2) + d * (x ** 3) + e
,其中e
是误差项,a
是截距,b
、c
、d
是其他次数的系数.
拟合模型f
将满足f(0) = 8
当且仅当8 = a_est + b_est * 0 + c_est * 0 + d_est * 0
,且将满足f(8) = 60
当且仅当60 = a_est + b_est * 8 + c_est * (8 ** 2) + d_est * (8 ** 3)
.
因此,我在模型中添加了以下约束:
1 * a + 0 * b + 0 * c + 0 * d = 8
1 * a + 8 * b + 64 * c + 512 * d = 60