Skip to main content

Generalized Multiclass Support Vector Machines

Project description

GenSVM Python Package

Build Status Documentation Status

This is the Python package for the GenSVM multiclass classifier by Gerrit J.J. van den Burg and Patrick J.F. Groenen.

Useful links:

Installation

Before GenSVM can be installed, a working NumPy installation is required. so GenSVM can be installed using the following command:

$ pip install numpy && pip install gensvm

If you encounter any errors, please open an issue on GitHub. Don't hesitate, you're helping to make this project better!

Citing

If you use this package in your research please cite the paper, for instance using the following BibTeX entry::

@article{JMLR:v17:14-526,
        author  = {{van den Burg}, G. J. J. and Groenen, P. J. F.},
        title   = {{GenSVM}: A Generalized Multiclass Support Vector Machine},
        journal = {Journal of Machine Learning Research},
        year    = {2016},
        volume  = {17},
        number  = {225},
        pages   = {1-42},
        url     = {http://jmlr.org/papers/v17/14-526.html}
}

Usage

The package contains two classes to fit the GenSVM model: GenSVM and GenSVMGridSearchCV. These classes respectively fit a single GenSVM model or fit a series of models for a parameter grid search. The interface to these classes is the same as that of classifiers in Scikit-Learn so users familiar with Scikit-Learn should have no trouble using this package. Below we will show some examples of using the GenSVM classifier and the GenSVMGridSearchCV class in practice.

In the examples we assume that we have loaded the iris dataset from Scikit-Learn as follows:

>>> from sklearn.datasets import load_iris
>>> from sklearn.model_selection import train_test_split
>>> from sklearn.preprocessing import MaxAbsScaler
>>> X, y = load_iris(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y)
>>> scaler = MaxAbsScaler().fit(X_train)
>>> X_train, X_test = scaler.transform(X_train), scaler.transform(X_test)

Note that we scale the data using the MaxAbsScaler function. This scales the columns of the data matrix to [-1, 1] without breaking sparsity. Scaling the dataset can have a significant effect on the computation time of GenSVM and is generally recommended for SVMs.

Example 1: Fitting a single GenSVM model

Let's start by fitting the most basic GenSVM model on the training data:

>>> from gensvm import GenSVM
>>> clf = GenSVM()
>>> clf.fit(X_train, y_train)
GenSVM(coef=0.0, degree=2.0, epsilon=1e-06, gamma='auto', kappa=0.0,
kernel='linear', kernel_eigen_cutoff=1e-08, lmd=1e-05,
max_iter=100000000.0, p=1.0, random_state=None, verbose=0,
weights='unit')

With the model fitted, we can predict the test dataset:

>>> y_pred = clf.predict(X_test)

Next, we can compute a score for the predictions. The GenSVM class has a score method which computes the accuracy_score for the predictions. In the GenSVM paper, the adjusted Rand index is often used to compare performance. We illustrate both options below (your results may be different depending on the exact train/test split):

>>> clf.score(X_test, y_test)
1.0
>>> from sklearn.metrics import adjusted_rand_score
>>> adjusted_rand_score(clf.predict(X_test), y_test)
1.0

We can try this again by changing the model parameters, for instance we can turn on verbosity and use the Euclidean norm in the GenSVM model by setting p = 2:

>>> clf2 = GenSVM(verbose=True, p=2)
>>> clf2.fit(X_train, y_train)
Starting main loop.
Dataset:
    n = 112
    m = 4
    K = 3
Parameters:
    kappa = 0.000000
    p = 2.000000
    lambda = 0.0000100000000000
    epsilon = 1e-06

iter = 0, L = 3.4499531579689533, Lbar = 7.3369415851139745, reldiff = 1.1266786095824437
...
Optimization finished, iter = 4046, loss = 0.0230726364692517, rel. diff. = 0.0000009998645783
Number of support vectors: 9
GenSVM(coef=0.0, degree=2.0, epsilon=1e-06, gamma='auto', kappa=0.0,
    kernel='linear', kernel_eigen_cutoff=1e-08, lmd=1e-05,
    max_iter=100000000.0, p=2, random_state=None, verbose=True,
    weights='unit')

For other parameters that can be tuned in the GenSVM model, see GenSVM.

Example 2: Fitting a GenSVM model with a "warm start"

One of the key features of the GenSVM classifier is that training can be accelerated by using so-called "warm-starts". This way the optimization can be started in a location that is closer to the final solution than a random starting position would be. To support this, the fit method of the GenSVM class has an optional seed_V parameter. We'll illustrate how this can be used below.

We start with relatively large value for the epsilon parameter in the model. This is the stopping parameter that determines how long the optimization continues (and therefore how exact the fit is).

>>> clf1 = GenSVM(epsilon=1e-3)
>>> clf1.fit(X_train, y_train)
...
>>> clf1.n_iter_
163

The n_iter_ attribute tells us how many iterations the model did. Now, we can use the solution of this model to start the training for the next model:

>>> clf2 = GenSVM(epsilon=1e-8)
>>> clf2.fit(X_train, y_train, seed_V=clf1.combined_coef_)
...
>>> clf2.n_iter_
3196

Compare this to a model with the same stopping parameter, but without the warm start:

>>> clf2.fit(X_train, y_train)
...
>>> clf2.n_iter_
3699

So we saved about 500 iterations! This effect will be especially significant with large datasets and when you try out many parameter configurations. Therefore this technique is built into the GenSVMGridSearchCV class that can be used to do a grid search of parameters.

Example 3: Running a GenSVM grid search

Often when we're fitting a machine learning model such as GenSVM, we have to try several parameter configurations to figure out which one performs best on our given dataset. This is usually combined with cross validation to avoid overfitting. To do this efficiently and to make use of warm starts, the GenSVMGridSearchCV class is available. This class works in the same way as the GridSearchCV class of Scikit-Learn, but uses the GenSVM C library for speed.

To do a grid search, we first have to define the parameters that we want to vary and what values we want to try:

>>> from gensvm import GenSVMGridSearchCV
>>> param_grid = {'p': [1.0, 2.0], 'lmd': [1e-8, 1e-6, 1e-4, 1e-2, 1.0], 'kappa': [-0.9, 0.0] }

For the values that are not varied in the parameter grid, the default values will be used. This means that if you want to change a specific value (such as epsilon for instance), you can add this to the parameter grid as a parameter with a single value to try (e.g. 'epsilon': [1e-8]).

Running the grid search is now straightforward:

>>> gg = GenSVMGridSearchCV(param_grid)
>>> gg.fit(X_train, y_train)
GenSVMGridSearchCV(cv=None, iid=True,
      param_grid={'p': [1.0, 2.0], 'lmd': [1e-06, 0.0001, 0.01, 1.0], 'kappa': [-0.9, 0.0]},
      refit=True, return_train_score=True, scoring=None, verbose=0)

Note that if we have set refit=True (the default), then we can use the GenSVMGridSearchCV instance to predict or score using the best estimator found in the grid search:

>>> y_pred = gg.predict(X_test)
>>> gg.score(X_test, y_test)
1.0

A nice feature borrowed from Scikit-Learn_ is that the results from the grid search can be represented as a pandas DataFrame:

>>> from pandas import DataFrame
>>> df = DataFrame(gg.cv_results_)

This can make it easier to explore the results of the grid search.

Known Limitations

The following are known limitations that are on the roadmap for a future release of the package. If you need any of these features, please vote on them on the linked GitHub issues (this can make us add them sooner!).

  1. Support for sparse matrices. NumPy supports sparse matrices, as does the GenSVM C library. Getting them to work together requires some additional effort. In the meantime, if you really want to use sparse data with GenSVM (this can lead to significant speedups!), check out the GenSVM C library.
  2. Specification of class misclassification weights. Currently, incorrectly classification an object from class A to class C is as bad as incorrectly classifying an object from class B to class C. Depending on the application, this may not be the desired effect. Adding class misclassification weights can solve this issue.

Questions and Issues

If you have any questions or encounter any issues with using this package, please ask them on GitHub.

License

This package is licensed under the GNU General Public License version 3.

Copyright (c) G.J.J. van den Burg, excluding the sections of the code that are explicitly marked to come from Scikit-Learn.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gensvm-0.2.7.tar.gz (179.0 kB view details)

Uploaded Source

Built Distributions

gensvm-0.2.7-cp38-cp38-manylinux2010_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

gensvm-0.2.7-cp38-cp38-manylinux2010_i686.whl (3.5 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

gensvm-0.2.7-cp38-cp38-macosx_10_14_x86_64.whl (134.7 kB view details)

Uploaded CPython 3.8 macOS 10.14+ x86-64

gensvm-0.2.7-cp37-cp37m-manylinux2010_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

gensvm-0.2.7-cp37-cp37m-manylinux2010_i686.whl (3.5 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

gensvm-0.2.7-cp37-cp37m-macosx_10_14_intel.whl (215.0 kB view details)

Uploaded CPython 3.7m macOS 10.14+ intel

gensvm-0.2.7-cp36-cp36m-manylinux2010_x86_64.whl (4.1 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

gensvm-0.2.7-cp36-cp36m-manylinux2010_i686.whl (3.5 MB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ i686

gensvm-0.2.7-cp36-cp36m-macosx_10_14_intel.whl (134.8 kB view details)

Uploaded CPython 3.6m macOS 10.14+ intel

File details

Details for the file gensvm-0.2.7.tar.gz.

File metadata

  • Download URL: gensvm-0.2.7.tar.gz
  • Upload date:
  • Size: 179.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.8.6

File hashes

Hashes for gensvm-0.2.7.tar.gz
Algorithm Hash digest
SHA256 1340114dccdb15cbef53acd19d9b775bd631a2ff0a820b60fbac2391d263e7c3
MD5 c69ae15b335a7cc88c6a76425aad011e
BLAKE2b-256 2f6708d4787cdfee69a416b028d9880b95a4b3bcf970437c9fa190033188630d

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.1

File hashes

Hashes for gensvm-0.2.7-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 40d177479ccfcd1b277aa840ff047668aaa8e6c79d6480f2f2b06d9cb78595cf
MD5 b86cd42684d03cc0d21b9530cc0aae77
BLAKE2b-256 cbb847227391868a26db5a9cb6f36599e9042005882e1b37279a3399ae1067a4

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp38-cp38-manylinux2010_i686.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp38-cp38-manylinux2010_i686.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.1

File hashes

Hashes for gensvm-0.2.7-cp38-cp38-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 5defaf0a62c971252fd31b98ca54e5b3e14882a7ec4f095aaf6d2b528ef6959a
MD5 84e92e7c85a400eb16f8b0fd64b436cf
BLAKE2b-256 4533584195d72c1a9826b0a03176306df7db2bfbdf5b02be80366da62cfce5e7

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp38-cp38-macosx_10_14_x86_64.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp38-cp38-macosx_10_14_x86_64.whl
  • Upload date:
  • Size: 134.7 kB
  • Tags: CPython 3.8, macOS 10.14+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.0

File hashes

Hashes for gensvm-0.2.7-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm Hash digest
SHA256 a11ee0edf220bf27a05bc08187ada7395c2648550a4eabe6161980d82d2d7c07
MD5 330df0886999e4aaa231b56236ebd9c2
BLAKE2b-256 fa22e1d531fda468968470382ad3653cbee3b5cc3a7945b5eb7928ac7c1f38ef

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.1

File hashes

Hashes for gensvm-0.2.7-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 3b6709b3a0d3d2f6549847059bc3ac433dfaa6ca7cf12091fae5a9927c1a5676
MD5 c888e35b654033a336a081721e630952
BLAKE2b-256 e154ecb6abcf4fa4d6e6064d8d106efb6eaad499ecc870ffe7c2072a107ccfb1

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp37-cp37m-manylinux2010_i686.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp37-cp37m-manylinux2010_i686.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.1

File hashes

Hashes for gensvm-0.2.7-cp37-cp37m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 7089a0212f76302ba6fe7292c32d6be6a5fb5e2e8c60dad4c9e465e03d426809
MD5 241b62c4ad1a777f09d968054d0f2ecb
BLAKE2b-256 bda2c0f4f428a48dc55c9b3bdcad70cd72bc9e8fcb3a92f255f12fae678369ea

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp37-cp37m-macosx_10_14_intel.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp37-cp37m-macosx_10_14_intel.whl
  • Upload date:
  • Size: 215.0 kB
  • Tags: CPython 3.7m, macOS 10.14+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.0

File hashes

Hashes for gensvm-0.2.7-cp37-cp37m-macosx_10_14_intel.whl
Algorithm Hash digest
SHA256 0e8fa7174f0e22f64512744954f84e4a8b0e86846fed0cd9725633259e3c7150
MD5 4e680d75db887fe4a6928df8521f36c7
BLAKE2b-256 cc7e6d3092f4a0e502e9af0cad86e183cac08827c428302b023575f78246768f

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 4.1 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.1

File hashes

Hashes for gensvm-0.2.7-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f15153006d8c23c6e5b0f3c030accc33b481a128b3cc6bcdbcf8adf5aba5452b
MD5 9362be3dc3947ac2502f900a2699a44b
BLAKE2b-256 3a1b8e2d6efdae5c1d403e065ffb7d4242565d97569d39640c8ed240060adfb8

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp36-cp36m-manylinux2010_i686.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp36-cp36m-manylinux2010_i686.whl
  • Upload date:
  • Size: 3.5 MB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ i686
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.1

File hashes

Hashes for gensvm-0.2.7-cp36-cp36m-manylinux2010_i686.whl
Algorithm Hash digest
SHA256 c8aaf4d898eaab81a7f24fea6a5c8f683253e103dce1f73795ca76821c590a3d
MD5 758ffe1c8de84a40ce98d51e7baebb7d
BLAKE2b-256 999e4705b5f7a7ac45db6471d874e40028b5d840fcd95220ac28961a0c3e94e7

See more details on using hashes here.

File details

Details for the file gensvm-0.2.7-cp36-cp36m-macosx_10_14_intel.whl.

File metadata

  • Download URL: gensvm-0.2.7-cp36-cp36m-macosx_10_14_intel.whl
  • Upload date:
  • Size: 134.8 kB
  • Tags: CPython 3.6m, macOS 10.14+ intel
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.0

File hashes

Hashes for gensvm-0.2.7-cp36-cp36m-macosx_10_14_intel.whl
Algorithm Hash digest
SHA256 7e6a4bb8771e63cc3409a2ddc7669e95183888ce5b779d4b0f7e6727c585a023
MD5 c7efa5ea7c3c0eabe83157ed76099858
BLAKE2b-256 3d99c94ecdf34ee7a8c200621a1980e0ddc377662700acc422ffe1b7170c9996

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page