Python 如何训练每一个pandaprame行的线性回归并生成斜率

发布于03月17日

我制作了以下的Pandas pandas p

import numpy as np
import pandas as pd
    
ds = {'col1' : [11,22,33,24,15,6,7,68,79,10,161,12,113,147,115]}
df = pd.DataFrame(data=ds)

predFeature = []

for i in range(len(df)):
    predFeature.append(0)
    predFeature[i] = predFeature[i-1]+1

df['predFeature'] = predFeature

                
arrayTarget = []
arrayPred = []
target = np.array(df['col1'])
predFeature = np.array(df['predFeature'])

for i in range(len(df)):

    arrayTarget.append(target[i-4:i])
    arrayPred.append(predFeature[i-4:i])
        
df['arrayTarget'] = arrayTarget
df['arrayPred'] = arrayPred

它看起来像这样:

    col1  predFeature          arrayTarget         arrayPred
0     11            1                   []                []
1     22            2                   []                []
2     33            3                   []                []
3     24            4                   []                []
4     15            5     [11, 22, 33, 24]      [1, 2, 3, 4]
5      6            6     [22, 33, 24, 15]      [2, 3, 4, 5]
6      7            7      [33, 24, 15, 6]      [3, 4, 5, 6]
7     68            8       [24, 15, 6, 7]      [4, 5, 6, 7]
8     79            9       [15, 6, 7, 68]      [5, 6, 7, 8]
9     10           10       [6, 7, 68, 79]      [6, 7, 8, 9]
10   161           11      [7, 68, 79, 10]     [7, 8, 9, 10]
11    12           12    [68, 79, 10, 161]    [8, 9, 10, 11]
12   113           13    [79, 10, 161, 12]   [9, 10, 11, 12]
13   147           14   [10, 161, 12, 113]  [10, 11, 12, 13]
14   115           15  [161, 12, 113, 147]  [11, 12, 13, 14]

我需要生成一个名为slope的新列，它对应于为每行训练的线性回归系数，并针对该系数:

target =每个数组包含在arrayTarget
预测功能=arrayPred中包含的每个数组

例如:

前4行的slope是null.
第5行的斜率由考虑以下值的线性回归系数给出:
- 独立值(或预测值):[1, 2, 3, 4]
- 相关值(或预测值):[11, 22, 33, 24] 结果是:0.10204081632653061.
第6行的斜率由考虑以下值的线性回归系数给出:
- 独立值(或预测值):[2, 3, 4, 5]
- 相关值(或预测值):[22, 33, 24, 15] 结果是:-0.09090909090909091.

等

有谁能帮帮我吗？

import numpy as np import pandas as pd from sklearn.linear_model import LinearRegression lr = LinearRegression() def calculate_slope(x, y): if len(x) < 1: return np.nan lr.fit(x.reshape(-1, 1), y) return lr.coef_[0] df["slope"] = df.apply( lambda x: calculate_slope(x["arrayTarget"], x["arrayPred"]), axis=1 )

col1 predFeature arrayTarget arrayPred slope 0 11 1 [] [] NaN 1 22 2 [] [] NaN 2 33 3 [] [] NaN 3 24 4 [] [] NaN 4 15 5 [11, 22, 33, 24] [1, 2, 3, 4] 0.102041 5 6 6 [22, 33, 24, 15] [2, 3, 4, 5] -0.090909 6 7 7 [33, 24, 15, 6] [3, 4, 5, 6] -0.111111 7 68 8 [24, 15, 6, 7] [4, 5, 6, 7] -0.142857 8 79 9 [15, 6, 7, 68] [5, 6, 7, 8] 0.030418 9 10 10 [6, 7, 68, 79] [6, 7, 8, 9] 0.030769 10 161 11 [7, 68, 79, 10] [7, 8, 9, 10] 0.002331 11 12 12 [68, 79, 10, 161] [8, 9, 10, 11] 0.009048 12 113 13 [79, 10, 161, 12] [9, 10, 11, 12] -0.001640 13 147 14 [10, 161, 12, 113] [10, 11, 12, 13] 0.004698 14 115 15 [161, 12, 113, 147] [11, 12, 13, 14] 0.002174

Python 如何训练每一个pandaprame行的线性回归并生成斜率

推荐答案

Python相关问答推荐

我从带有langchain的mongoDB中的vector serch获得一个空数组

Django mysql图标不适用于小 case

重新匹配{ }中包含的文本，其中文本可能包含{{var}

try 在树叶 map 上应用覆盖磁贴

在Wayland上使用setCellWidget时，try 编辑QTable Widget中的单元格时，PyQt 6崩溃

try 将一行连接到Tensorflow中的矩阵

Scrapy和Great Expectations(great_expectations)—不合作

在单个对象中解析多个Python数据帧

使用Python从URL下载Excel文件

如何从需要点击/切换的网页中提取表格？

可以bcrypts AES—256 GCM加密损坏ZIP文件吗？

基于多个数组的多个条件将值添加到numpy数组

ModuleNotFoundError：没有模块名为x时try 运行我的代码''

为什么我的sundaram筛这么低效

在Docker容器(Alpine)上运行的Python应用程序中读取. accdb数据库

使用python playwright从 Select 子菜单中 Select 值

在MongoDB文档中仅返回数组字段

PYODBC错误(SQL包含-26272个参数标记，但提供了235872个参数，HY 000)

如何有效地计算所有输出相对于参数的梯度？

如何从具有完整层次数据的Pandas框架生成图形？