sklearn中分类器的比较-yb体育官方

tsinghuazhuoqing 发表于 2021/12/25 14:52:17 2021/12/25
【摘要】 简 介: 运行对比了 分类器的比较? 中的sklearn中的分类的性能对比。这为我们理解机器学习中的特性提供了理解基础。关键词: sklearn,python  分类器比较   在 分类器的比较? 给出了在sklearn的python包中的几类分类器性能的比较。 1.1 sklearn分类器  在人工数据集上比较scikit-learn中的几种分类器。这个例子的重点是说明不同分类器的决策边界...

简 介: 运行对比了 中的sklearn中的分类的性能对比。这为我们理解机器学习中的特性提供了理解基础。

关键词 sklearnpython

 


   给出了在sklearn的python包中的几类分类器性能的比较。

1.1 sklearn分类器

  在人工数据集上比较scikit-learn中的几种分类器。这个例子的重点是说明不同分类器的决策边界的性质。这些例子所传达的直觉不一定会传递给真实的数据集,因此,这一点应该有所把握。

  特别是在高维空间中,数据更容易被线性分离,而朴素贝叶斯和线性支持向量机等分类器的简单性可能导致比其他分类器更好的泛化。

1.1.1 分类器

  测试的分类器总共十个:

names = ["nearest neighbors", "linear svm", "rbf svm", "gaussian process",
         "decision tree", "random forest", "neural net", "adaboost",
         "naive bayes", "qda"]
* nearest neighbors
* linear svm
* rbf svm
* gaussian process
* decision tree
* random forest
* neural net
* adaboost
* naive bayes
* qda

1.2 数据集合

  图中显示实色训练点和半透明的测试点。右下角显示测试集的分类准确率。

▲ 图1.1.1 训练数据集合
左:弯月数据集合;中:圆环数据集合;右:线性可分数据集合

▲ 图1.2.2 左:nearest neighbors;中:linear svm; 右:rbf svm

▲ 图1.2.3 左:gaussion process;中:decision tree; 右:random forest

▲ 图1.2.4 左:neural net;左中:adaboost; 右中:naive bayes;右:qda

1.3 测试代码

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# test2.py                     -- by dr. zhuoqing 2021-12-24
#
# note:
#============================================================
from headm import *                 # =
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import listedcolormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import standardscaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import mlpclassifier
from sklearn.neighbors import kneighborsclassifier
from sklearn.svm import svc
from sklearn.gaussian_process import gaussianprocessclassifier
from sklearn.gaussian_process.kernels import rbf
from sklearn.tree import decisiontreeclassifier
from sklearn.ensemble import randomforestclassifier, adaboostclassifier
from sklearn.naive_bayes import gaussiannb
from sklearn.discriminant_analysis import quadraticdiscriminantanalysis
#------------------------------------------------------------
h = .02  # step size in the mesh
names = ["nearest neighbors", "linear svm", "rbf svm", "gaussian process",
         "decision tree", "random forest", "neural net", "adaboost",
         "naive bayes", "qda"]
#------------------------------------------------------------
classifiers = [
    kneighborsclassifier(3),
    svc(kernel="linear", c=0.025),
    svc(gamma=2, c=1),
    gaussianprocessclassifier(1.0 * rbf(1.0)),
    decisiontreeclassifier(max_depth=5),
    randomforestclassifier(max_depth=5, n_estimators=10, max_features=1),
    mlpclassifier(alpha=1, max_iter=1000),
    adaboostclassifier(),
    gaussiannb(),
    quadraticdiscriminantanalysis()]
#------------------------------------------------------------
x, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                           random_state=1, n_clusters_per_class=1)
rng = np.random.randomstate(2)
x  = 2 * rng.uniform(size=x.shape)
linearly_separable = (x, y)
datasets = [make_moons(noise=0.3, random_state=0),
            make_circles(noise=0.2, factor=0.5, random_state=1),
            linearly_separable
            ]
#------------------------------------------------------------
figure = plt.figure(figsize=(27, 9))
i = 1
#------------------------------------------------------------
# iterate over datasets
for ds_cnt, ds in enumerate(datasets):
    # preprocess dataset, split into training and test part
    x, y = ds
    x = standardscaler().fit_transform(x)
    x_train, x_test, y_train, y_test = \
        train_test_split(x, y, test_size=.4, random_state=42)
    x_min, x_max = x[:, 0].min() - .5, x[:, 0].max()  .5
    y_min, y_max = x[:, 1].min() - .5, x[:, 1].max()  .5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    # just plot the dataset first
    cm = plt.cm.rdbu
    cm_bright = listedcolormap(['#ff0000', '#0000ff'])
    ax = plt.subplot(len(datasets), len(classifiers)  1, i)
    if ds_cnt == 0:
        ax.set_title("input data")
    # plot the training points
    ax.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=cm_bright,
               edgecolors='k')
    # plot the testing points
    ax.scatter(x_test[:, 0], x_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6,
               edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xticks(())
    ax.set_yticks(())
    i  = 1
    # iterate over classifiers
    for name, clf in zip(names, classifiers):
        ax = plt.subplot(len(datasets), len(classifiers)  1, i)
        clf.fit(x_train, y_train)
        score = clf.score(x_test, y_test)
        # plot the decision boundary. for that, we will assign a color to each
        # point in the mesh [x_min, x_max]x[y_min, y_max].
        if hasattr(clf, "decision_function"):
            z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
        else:
            z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
        # put the result into a color plot
        z = z.reshape(xx.shape)
        ax.contourf(xx, yy, z, cmap=cm, alpha=.8)
        # plot the training points
        ax.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=cm_bright,
                   edgecolors='k')
        # plot the testing points
        ax.scatter(x_test[:, 0], x_test[:, 1], c=y_test, cmap=cm_bright,
                   edgecolors='k', alpha=0.6)
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())
        if ds_cnt == 0:
            ax.set_title(name)
        ax.text(xx.max() - .3, yy.min()  .3, ('%.2f' % score).lstrip('0'),
                size=15, horizontalalignment='right')
        i  = 1
#------------------------------------------------------------
plt.tight_layout()
plt.show()
#------------------------------------------------------------
#        end of file : test2.py
#============================================================

 


  行对比了 中的sklearn中的分类的性能对比。这为我们理解机器学习中的特性提供了理解基础。

■ 相关文献链接:

● 相关图表链接:

【亚博平台下载的版权声明】本文为华为云社区用户原创内容,转载时必须标注文章的来源(华为云社区),文章链接,文章作者等基本信息,否则作者和本社区有权追究责任。如果您发现本社区中有涉嫌抄袭的内容,欢迎发送邮件至:进行举报,并提供相关证据,一经查实,本社区将立刻删除涉嫌侵权内容。
  • 点赞
  • 收藏
  • 关注作者

评论(0

0/1000
抱歉,系统识别当前为高风险访问,暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称,即可参与社区互动!

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。

*长度不超过10个汉字或20个英文字符,设置后3个月内不可修改。