As we know that bagging ensemble methods work well with the algorithms that have high variance and, in this concern, the best one is decision tree algorithm. In the following Python recipe, we are going to build bagged decision tree ensemble model by using BaggingClassifier function of sklearn with DecisionTreeClasifier (a classification & regression trees algorithm) on Pima Indians diabetes dataset.
首先,导入所需的软件包,如下所示:
from pandas import read_csv from sklearn.model_selection import KFold from sklearn.model_selection import cross_val_score from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier
现在,我们需要像之前的Example一样加载Pima糖尿病数据集-
path=r"C:\pima-indians-diabetes.csv" headernames=['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] data=read_csv(path, names=headernames) array=data.values X=array[:,0:8] Y=array[:,8]
接下来,输入用于十折交叉验证的输入,如下所示:
seed=7 kfold=KFold(n_splits=10, random_state=seed) cart=DecisionTreeClassifier()
我们需要提供要建造的树木数量。在这里,我们正在建造150棵树-
来源:LearnFk无涯教程网
num_trees=150
接下来,在以下脚本的帮助下构建模型-
model=BaggingClassifier(base_estimator=cart, n_estimators=num_trees, random_state=seed)
计算并打印输出如下-
results=cross_val_score(model, X, Y, cv=kfold) print(results.mean())
输出
0.7733766233766234
上面的输出显示,我们的袋装决策树分类器模型的准确率约为77%。
这一章《Python机器学习 - Bagged Decision Tree函数》你学到了什么?在下面做个笔记吧!做站不易,你的分享是对我们最大的支持