• 入门教程
• 分类教程
• 回归教程
• 聚类教程
• KNN教程
• 关注我们

# KNN算法 - 自动工作流程

## ML数据准备

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
array = data.values

estimators = []
estimators.append(('standardize', StandardScaler()))
estimators.append(('lda', LinearDiscriminantAnalysis()))
model = Pipeline(estimators)

kfold = KFold(n_splits = 20, random_state = 7)
results = cross_val_score(model, X, Y, cv = kfold)
print(results.mean())
0.7790148448043184

## ML特征提取

ML模型的特征提取步骤也可能发生数据泄漏，这就是为什么也应该限制特征提取过程以阻止无涯教程的训练数据集中数据泄漏的原因，与数据准备一样，通过使用ML管道，也可以防止这种数据泄漏， ML管道提供的FeatureUnion工具可用于此目的。

FeatureUnion工具。最后，将创建Logistic回归模型，并使用10倍交叉验证对管道进行判断。

from pandas import read_csv
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
from sklearn.pipeline import Pipeline
from sklearn.pipeline import FeatureUnion
from sklearn.linear_model import LogisticRegression
from sklearn.decomposition import PCA
from sklearn.feature_selection import SelectKBest

path = r"C:\pima-indians-diabetes.csv"
headernames = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
array = data.values

features = []
features.append(('pca', PCA(n_components=3)))
features.append(('select_best', SelectKBest(k=6)))
feature_union = FeatureUnion(features)

estimators = []
estimators.append(('feature_union', feature_union))
estimators.append(('logistic', LogisticRegression()))
model = Pipeline(estimators)

kfold = KFold(n_splits = 20, random_state = 7)
results = cross_val_score(model, X, Y, cv = kfold)
print(results.mean())
0.7789811066126855

## 猜你喜欢

JavaScript核心原理解析 -〔周爱民〕

Linux内核技术实战课 -〔邵亚方〕