- 微信
- 复制链接
  
  复制链接到剪贴板

sklearn中分类器的比较-yb体育官方

tsinghuazhuoqing 发表于 2021/12/25 14:52:17 2021/12/25

【摘要】简介：运行对比了分类器的比较? 中的sklearn中的分类的性能对比。这为我们理解机器学习中的特性提供了理解基础。关键词： sklearn，python 分类器比较在分类器的比较? 给出了在sklearn的python包中的几类分类器性能的比较。 1.1 sklearn分类器在人工数据集上比较scikit-learn中的几种分类器。这个例子的重点是说明不同分类器的决策边界...

简介：运行对比了中的sklearn中的分类的性能对比。这为我们理解机器学习中的特性提供了理解基础。

关键词： sklearn，python

在给出了在sklearn的python包中的几类分类器性能的比较。

1.1 sklearn分类器

在人工数据集上比较scikit-learn中的几种分类器。这个例子的重点是说明不同分类器的决策边界的性质。这些例子所传达的直觉不一定会传递给真实的数据集，因此，这一点应该有所把握。

特别是在高维空间中，数据更容易被线性分离，而朴素贝叶斯和线性支持向量机等分类器的简单性可能导致比其他分类器更好的泛化。

1.1.1 分类器

测试的分类器总共十个：

names = ["nearest neighbors", "linear svm", "rbf svm", "gaussian process",
         "decision tree", "random forest", "neural net", "adaboost",
         "naive bayes", "qda"]

* nearest neighbors
* linear svm
* rbf svm
* gaussian process
* decision tree
* random forest
* neural net
* adaboost
* naive bayes
* qda

1.2 数据集合

图中显示实色训练点和半透明的测试点。右下角显示测试集的分类准确率。

▲ 图1.1.1 训练数据集合
左：弯月数据集合；中：圆环数据集合；右：线性可分数据集合

▲ 图1.2.2 左：nearest neighbors；中：linear svm；右：rbf svm

▲ 图1.2.3 左：gaussion process；中：decision tree；右：random forest

▲ 图1.2.4 左：neural net；左中：adaboost；右中：naive bayes；右：qda

1.3 测试代码

#!/usr/local/bin/python
# -*- coding: gbk -*-
#============================================================
# test2.py                     -- by dr. zhuoqing 2021-12-24
#
# note:
#============================================================
from headm import *                 # =
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.colors import listedcolormap
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import standardscaler
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.neural_network import mlpclassifier
from sklearn.neighbors import kneighborsclassifier
from sklearn.svm import svc
from sklearn.gaussian_process import gaussianprocessclassifier
from sklearn.gaussian_process.kernels import rbf
from sklearn.tree import decisiontreeclassifier
from sklearn.ensemble import randomforestclassifier, adaboostclassifier
from sklearn.naive_bayes import gaussiannb
from sklearn.discriminant_analysis import quadraticdiscriminantanalysis
#------------------------------------------------------------
h = .02  # step size in the mesh
names = ["nearest neighbors", "linear svm", "rbf svm", "gaussian process",
         "decision tree", "random forest", "neural net", "adaboost",
         "naive bayes", "qda"]
#------------------------------------------------------------
classifiers = [
    kneighborsclassifier(3),
    svc(kernel="linear", c=0.025),
    svc(gamma=2, c=1),
    gaussianprocessclassifier(1.0 * rbf(1.0)),
    decisiontreeclassifier(max_depth=5),
    randomforestclassifier(max_depth=5, n_estimators=10, max_features=1),
    mlpclassifier(alpha=1, max_iter=1000),
    adaboostclassifier(),
    gaussiannb(),
    quadraticdiscriminantanalysis()]
#------------------------------------------------------------
x, y = make_classification(n_features=2, n_redundant=0, n_informative=2,
                           random_state=1, n_clusters_per_class=1)
rng = np.random.randomstate(2)
x  = 2 * rng.uniform(size=x.shape)
linearly_separable = (x, y)
datasets = [make_moons(noise=0.3, random_state=0),
            make_circles(noise=0.2, factor=0.5, random_state=1),
            linearly_separable
            ]
#------------------------------------------------------------
figure = plt.figure(figsize=(27, 9))
i = 1
#------------------------------------------------------------
# iterate over datasets
for ds_cnt, ds in enumerate(datasets):
    # preprocess dataset, split into training and test part
    x, y = ds
    x = standardscaler().fit_transform(x)
    x_train, x_test, y_train, y_test = \
        train_test_split(x, y, test_size=.4, random_state=42)
    x_min, x_max = x[:, 0].min() - .5, x[:, 0].max()  .5
    y_min, y_max = x[:, 1].min() - .5, x[:, 1].max()  .5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    # just plot the dataset first
    cm = plt.cm.rdbu
    cm_bright = listedcolormap(['#ff0000', '#0000ff'])
    ax = plt.subplot(len(datasets), len(classifiers)  1, i)
    if ds_cnt == 0:
        ax.set_title("input data")
    # plot the training points
    ax.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=cm_bright,
               edgecolors='k')
    # plot the testing points
    ax.scatter(x_test[:, 0], x_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6,
               edgecolors='k')
    ax.set_xlim(xx.min(), xx.max())
    ax.set_ylim(yy.min(), yy.max())
    ax.set_xticks(())
    ax.set_yticks(())
    i  = 1
    # iterate over classifiers
    for name, clf in zip(names, classifiers):
        ax = plt.subplot(len(datasets), len(classifiers)  1, i)
        clf.fit(x_train, y_train)
        score = clf.score(x_test, y_test)
        # plot the decision boundary. for that, we will assign a color to each
        # point in the mesh [x_min, x_max]x[y_min, y_max].
        if hasattr(clf, "decision_function"):
            z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
        else:
            z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1]
        # put the result into a color plot
        z = z.reshape(xx.shape)
        ax.contourf(xx, yy, z, cmap=cm, alpha=.8)
        # plot the training points
        ax.scatter(x_train[:, 0], x_train[:, 1], c=y_train, cmap=cm_bright,
                   edgecolors='k')
        # plot the testing points
        ax.scatter(x_test[:, 0], x_test[:, 1], c=y_test, cmap=cm_bright,
                   edgecolors='k', alpha=0.6)
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())
        if ds_cnt == 0:
            ax.set_title(name)
        ax.text(xx.max() - .3, yy.min()  .3, ('%.2f' % score).lstrip('0'),
                size=15, horizontalalignment='right')
        i  = 1
#------------------------------------------------------------
plt.tight_layout()
plt.show()
#------------------------------------------------------------
#        end of file : test2.py
#============================================================

运行对比了中的sklearn中的分类的性能对比。这为我们理解机器学习中的特性提供了理解基础。

■ 相关文献链接:

□

● 相关图表链接:

点赞
收藏
关注作者

0/1000

抱歉，系统识别当前为高风险访问，暂不支持该操作

全部回复

上滑加载中

设置昵称

在此一键设置昵称，即可参与社区互动！

*长度不超过10个汉字或20个英文字符，设置后3个月内不可修改。