我制作了一个非常简单的程序,从csv文件中获取数据列,下面是文件数据的简短预览:

,matchId,blue_win,blueGold,blueMinionsKilled,blueJungleMinionsKilled,blueAvgLevel,redGold,redMinionsKilled,redJungleMinionsKilled,redAvgLevel,blueChampKills,blueHeraldKills,blueDragonKills,blueTowersDestroyed,redChampKills,redHeraldKills,redDragonKills,redTowersDestroyed
0,3493250918.0,0,24575.0,349.0,89.0,8.6,25856.0,346.0,80.0,9.2,6.0,1.0,0.0,1.0,12.0,2.0,0.0,1.0
1,3464936341.0,0,27210.0,290.0,36.0,9.0,28765.0,294.0,92.0,9.4,20.0,0.0,0.0,0.0,19.0,2.0,0.0,0.0
2,3428425921.0,1,32048.0,346.0,92.0,9.4,25305.0,293.0,84.0,9.4,17.0,3.0,0.0,0.0,11.0,0.0,0.0,4.0
3,3428347390.0,0,20261.0,223.0,60.0,8.2,30429.0,356.0,107.0,9.4,7.0,0.0,0.0,3.0,16.0,3.0,0.0,0.0
4,3428350940.0,1,30217.0,376.0,110.0,9.8,23889.0,334.0,60.0,8.8,16.0,3.0,0.0,0.0,8.0,0.0,0.0,2.0
5,3494458885.0,1,25470.0,362.0,82.0,9.2,22856.0,319.0,86.0,8.8,9.0,1.0,0.0,0.0,7.0,1.0,0.0,0.0
6,3463320642.0,1,25391.0,350.0,96.0,9.2,23236.0,345.0,80.0,8.6,8.0,2.0,0.0,0.0,5.0,1.0,0.0,1.0
...

我删除了不必要的列,并使用30%的数据作为测试数据运行测试,以预测蓝军赢得比赛的准确性:

import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model

df = pd.read_csv('MatchTimelinesFirst15.csv', delimiter=',')

predict = "blue_win"

df = df.drop('Unnamed: 0', axis=1)
df = df.drop('redDragonKills', axis=1)
df = df.drop('blueDragonKills', axis=1)
# print(df.describe())

x = np.array(df.drop([predict], axis=1))
y = np.array(df[predict])


for _ in range(500):
    x_train, x_test, y_train, y_test = sklearn.model_selection.train_test_split(x, y, test_size=0.30)

    # print('{0}, {1}'.format(type(x_train), x_train))

    linear = linear_model.LinearRegression()

    # trains model
    linear.fit(x_train, y_train)

    acc = linear.score(x_test, y_test)

    print('Accuracy: {0}'.format(acc))

但我的准确度不会提高,即使通过500次循环训练它?我不断得到相同范围的结果:

Accuracy: 0.39030223064480596
Accuracy: 0.3980014684661366
Accuracy: 0.3840247556358104
Accuracy: 0.3939949181269252
Accuracy: 0.38657487661026535
Accuracy: 0.3950506154649621
Accuracy: 0.3925506648304995
...

我对python和机器学习非常陌生,因此非常感谢您提供的任何帮助,以及改进方面的帮助.

推荐答案

您没有通过使用循环进一步训练模型.您每500次重新开始一次,唯一的区别是随机初始化您的训练测试分割.

至于分类器的改进,我会避开线性回归.回归与分类不是一回事.分类将预测分类类别标签,回归预测连续数量.

因为您想知道蓝色团队何时获胜,所以您有一个二进制分类问题.要么蓝队赢,要么不赢.

try 分类模型,比如SVM.

祝你好运

Python相关问答推荐

python—telegraph—bot send_voice发送空文件

交替字符串位置的正则表达式

如何从pandas DataFrame中获取. groupby()和. agg()之后的子列?

Pandas 数据帧中的枚举,不能在枚举列上执行GROUP BY吗?

python的文件. truncate()意外地没有截断'

极点替换值大于组内另一个极点数据帧的最大值

Polars表达式无法访问中间列创建表达式

如何在Python中解析特定的文本,这些文本包含了同一行中的所有内容,

正在try 让Python读取特定的CSV文件

如何在Quarto中的标题页之前创建序言页

VSCode Pylance假阳性(?)对ImportError的react

如何在开始迭代自定义迭代器类时重置索引属性?

as_index=False groupBy不支持count

Pandas 修正滚动平均

Chrome 122-如何允许不安全的内容?(不安全下载被阻止)

在使用TO_EXCEL时如何为正数加上加号?

如何进行序列的顺序减法,从初始合计减为零,从当前行值的前一个单元格减go

是什么导致了这个Gekko语法错误:函数字符串的语法错误:缺少左括号?

根据来自数据帧的特定迭代数合并数据帧

在Python中使用";swmm_api";包中的";lid_use";时出错?