Python 将函数拟合到曲线上，然后删除某些点

发布于02月13日

我正试图提出一种测试，它只考虑明显遵循拟合曲线的点.所以，对于我下面的数据，它会截断x=0.13附近的任何低于x=0.13的点，因为低于x=0.13的点不能很好地跟随曲线.在x=0.13以上，你可以看到它遵循一条清晰的曲线.我正试图最大限度地减少用户输入.起初，我打算简单地将幂定律曲线与数据进行拟合，然后计算残差，然后得出最大残差距离.但这意味着我必须给出一些值，而我正试图在没有用户输入的情况下自动给出值.

我有以下数据:

x = np.array([0.03751155, 0.05001541, 0.06251926, 0.07502311, 0.08752696,
        0.10003081, 0.11253466, 0.12503851, 0.13754236, 0.15004622,
       0.16255007, 0.17505392, 0.18755777, 0.20006162, 0.21256547,
       0.22506932, 0.23757318, 0.25007703, 0.26258088, 0.27508473,
       0.28758858, 0.30009243])
y = np.array([0.17091544, 0.2196002 , 0.24884891, 0.22784447, 0.365201,
       0.37375478, 0.39257039, 0.37231073, 0.41550739, 0.43636989,
       0.45111672, 0.46662792, 0.48854647, 0.49640163, 0.51887196,
       0.52827061, 0.54437941, 0.54929705, 0.56552202, 0.57508514,
      0.58477563, 0.59755615])

这是我用来绘制的图:

plt.scatter(x, y)

plt.grid(True)
plt.xlabel("X Data")
plt.ylabel("Y Data")

plt.ylim(-0.05,0.65)
plt.xlim(-0.05,0.32)

Edit个

以上数据为简化数据.应要求，以下是一些现实数据.

x = np.array([1.53989397, 2.04460628, 4.18043213, 2.97621482, 2.82642339,
       2.98335023, 2.98964836, 2.12218901, 1.42801972, 1.25930683,
       0.71644077, 0.48220866, 0.21165985, 0.24756609, 0.21123179,
       0.57344999, 0.49362762, 0.20282767, 0.50321186, 0.50347165,
       0.74259408, 0.48493783, 0.81785588, 0.54543666, 0.53218838])

y = array([1.53989397, 2.04460628, 4.18043213, 2.97621482, 2.82642339,
       2.98335023, 2.98964836, 2.12218901, 1.42801972, 1.25930683,
       0.71644077, 0.48220866, 0.21165985, 0.24756609, 0.21123179,
       0.57344999, 0.49362762, 0.20282767, 0.50321186, 0.50347165,
       0.74259408, 0.48493783, 0.81785588, 0.54543666, 0.53218838])

下面的图像显示了我正在try 做的事情的完整背景.我已经实现了加权最小二乘拟合.为此，我将(0，0)点添加到我的数据中.由于物理原因，我的点(0，0)和较高的x值点应该高度加权，而较低的x值点的权重非常小，如我的代码中的以下函数所示:

def custom_weights(x):
    weights = np.ones_like(x)  # default weight for all points
    weights[(0 < x) & (x <= 0.2)] = 1  # medium error for 0 < x <= 0.2
    weights[x > 0.2] = 0.1  # low error for x > 0.15
    weights[np.isclose(x, 0, atol=1e-8)] = 0.001  # very low error for x = 0
    return weights

正如您所看到的，我需要 Select 一个界限来定义我可以赋予较小权重的"较低"x值.在这段代码中，我亲眼看到了它，并 Select 0.2作为界限.然而，我正在试着想出一种方法，我可以 Select 这个界限，而不是亲眼看到它.

from sklearn.linear_model import RANSACRegressor, LinearRegression from matplotlib import pyplot as plt import numpy as np x = np.array([0, 0.03751155, 0.05001541, 0.06251926, 0.07502311, 0.08752696, 0.10003081, 0.11253466, 0.12503851, 0.13754236, 0.15004622, 0.16255007, 0.17505392, 0.18755777, 0.20006162, 0.21256547, 0.22506932, 0.23757318, 0.25007703, 0.26258088, 0.27508473, 0.28758858, 0.30009243, 0.32, 0.33, 0.35]).reshape(-1, 1) y = np.array([0, 1.53989397, 2.04460628, 4.18043213, 2.97621482, 2.82642339, 2.98335023, 2.98964836, 2.12218901, 1.42801972, 1.25930683, 0.71644077, 0.48220866, 0.21165985, 0.24756609, 0.21123179, 0.57344999, 0.49362762, 0.20282767, 0.50321186, 0.50347165, 0.74259408, 0.48493783, 0.81785588, 0.54543666, 0.53218838]) # View data plt.plot(x, y, linewidth=1, linestyle='-', color='lightgray', zorder=0) plt.scatter(x, y, color='lightgray', marker='o', s=100, label='original data') # #Fit RANSAC linear regression # ransac_lr = RANSACRegressor( estimator=LinearRegression(), min_samples=len(x) // 3, #Use at least 1/3 of the data per model max_trials=500, stop_score=0.95 #Stop if R2 is >= 95% ).fit(x, y) #Get the trendline trendline = ransac_lr.predict([[x.min()], [x.max()]]) # # Visualise results # plt.scatter(x[~ransac_lr.inlier_mask_], y[~ransac_lr.inlier_mask_], marker='x', color='tab:red', label='outlier') plt.scatter(x[ransac_lr.inlier_mask_], y[ransac_lr.inlier_mask_], marker='^', color='tab:green', label='inlier') plt.plot([x.min(), x.max()], trendline, linewidth=8, color='tab:green', alpha=0.3, label='RANSAC regressor') plt.gcf().legend() plt.gca().set(xlabel='x', ylabel='y') #Optional formatting plt.gcf().set_size_inches(7, 3) # remove right, top spines [plt.gca().spines[spine].set_visible(False) for spine in ['right', 'top']] # trim remaining spines plt.gca().spines.left.set_bounds(y.min(), y.max()) plt.gca().spines.bottom.set_bounds(x.min(), x.max())

Python 将函数拟合到曲线上，然后删除某些点

推荐答案

Python相关问答推荐

通过优化空间在Python中的饼图中添加标签

Pandas 第二小值有条件

对Numpy函数进行载体化

Matlab中是否有Python的f-字符串等效物

Python json.转储包含一些UTF-8字符的二元组，要么失败，要么转换它们.我希望编码字符按原样保留

可变参数数量的重载类型(args或kwargs)

Excel图表-使用openpyxl更改水平轴与Y轴相交的位置(Python)

Streamlit应用程序中的Plotly条形图中未正确显示Y轴刻度

Pandas Loc Select 到NaN和值列表

如何保持服务器发送的事件连接活动？

我的字符串搜索算法的平均时间复杂度和最坏时间复杂度是多少？

为什么numpy. vectorize调用vectorized函数的次数比vector中的元素要多？

Geopandas未返回正确的缓冲区(单位：米)

寻找Regex模式返回与我当前函数类似的结果

ruamel.yaml dump：如何阻止map标量值被移动到一个新的缩进行？

人口全部乱序 - Python—Matplotlib—映射

Python避免mypy在相互引用中从另一个类重定义类时失败

提高算法效率的策略？

没有内置pip模块的Python3.11--S在做什么？

如何将参数名作为参数传入到函数中？