我有一组数据,我想比较哪一行最能描述它(不同阶数的多项式,指数或对数).
我使用Python和Numpy,对于多项式拟合,有一个函数polyfit()
.但是我找不到这样的函数用于指数和对数拟合.
有吗?或者如何解决它?
我有一组数据,我想比较哪一行最能描述它(不同阶数的多项式,指数或对数).
我使用Python和Numpy,对于多项式拟合,有一个函数polyfit()
.但是我找不到这样的函数用于指数和对数拟合.
有吗?或者如何解决它?
对于拟合y=A+B对数x,只需将y对准(对数x)即可.
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> numpy.polyfit(numpy.log(x), y, 1)
array([ 8.46295607, 6.61867463])
# y ≈ 8.46 log(x) + 6.62
对于拟合y=AeBx,取两边的对数,得出logy=logA+Bx.因此,将(对数y)与x进行匹配.
Note that fitting (log y) as if it is linear will emphasize small values of y, causing large deviation for large y. This is because polyfit
(linear regression) works by minimizing ∑i (ΔY)2 = ∑i (Yi − Ŷi)2. When Yi = log yi, the residues ΔYi = Δ(log yi) ≈ Δyi / |yi|. So even if polyfit
makes a very bad decision for large y, the "divide-by-|y|" factor will compensate for it, causing polyfit
favors small values.
这可以通过给每个条目一个与y成比例的"权重"来缓解.polyfit
通过w
关键字参数支持加权最小二乘法.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> numpy.polyfit(x, numpy.log(y), 1)
array([ 0.10502711, -0.40116352])
# y ≈ exp(-0.401) * exp(0.105 * x) = 0.670 * exp(0.105 * x)
# (^ biased towards small values)
>>> numpy.polyfit(x, numpy.log(y), 1, w=numpy.sqrt(y))
array([ 0.06009446, 1.41648096])
# y ≈ exp(1.42) * exp(0.0601 * x) = 4.12 * exp(0.0601 * x)
# (^ not so biased)
Note that Excel, LibreOffice and most scientific calculators typically use the unweighted (biased) formula for the exponential regression / trend lines.如果你希望你的结果与这些平台兼容,不要包括权重,即使它能提供更好的结果.
现在,如果您可以使用Scipy,您可以使用scipy.optimize.curve_fit
来拟合任何没有变换的模型.
对于y=A+B log x,结果与变换方法相同:
>>> x = numpy.array([1, 7, 20, 50, 79])
>>> y = numpy.array([10, 19, 30, 35, 51])
>>> scipy.optimize.curve_fit(lambda t,a,b: a+b*numpy.log(t), x, y)
(array([ 6.61867467, 8.46295606]),
array([[ 28.15948002, -7.89609542],
[ -7.89609542, 2.9857172 ]]))
# y ≈ 6.62 + 8.46 log(x)
然而,对于y=AeBx,我们可以得到更好的拟合,因为它直接计算Δ(Logy).但是我们需要提供一个初始化猜测,以便curve_fit
可以达到所需的局部最小值.
>>> x = numpy.array([10, 19, 30, 35, 51])
>>> y = numpy.array([1, 7, 20, 50, 79])
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y)
(array([ 5.60728326e-21, 9.99993501e-01]),
array([[ 4.14809412e-27, -1.45078961e-08],
[ -1.45078961e-08, 5.07411462e+10]]))
# oops, definitely wrong.
>>> scipy.optimize.curve_fit(lambda t,a,b: a*numpy.exp(b*t), x, y, p0=(4, 0.1))
(array([ 4.88003249, 0.05531256]),
array([[ 1.01261314e+01, -4.31940132e-02],
[ -4.31940132e-02, 1.91188656e-04]]))
# y ≈ 4.88 exp(0.0553 x). much better.