Generalized Multiclass Support Vector Machines
Project description
GenSVM Python Package
This is the Python package for the GenSVM multiclass classifier by Gerrit J.J. van den Burg and Patrick J.F. Groenen.
Useful links:
 PyGenSVM on GitHub
 PyGenSVM on PyPI
 Package documentation
 Journal paper: GenSVM: A Generalized Multiclass Support Vector Machine JMLR, 17(225):1−42, 2016.
 There is also an R package
 Or you can directly use the C library
Installation
Before GenSVM can be installed, a working NumPy installation is required. so GenSVM can be installed using the following command:
$ pip install numpy && pip install gensvm
If you encounter any errors, please open an issue on GitHub. Don't hesitate, you're helping to make this project better!
Citing
If you use this package in your research please cite the paper, for instance using the following BibTeX entry::
@article{JMLR:v17:14526, author = {{van den Burg}, G. J. J. and Groenen, P. J. F.}, title = {{GenSVM}: A Generalized Multiclass Support Vector Machine}, journal = {Journal of Machine Learning Research}, year = {2016}, volume = {17}, number = {225}, pages = {142}, url = {http://jmlr.org/papers/v17/14526.html} }
Usage
The package contains two classes to fit the GenSVM model: GenSVM and GenSVMGridSearchCV. These classes respectively fit a single GenSVM model or fit a series of models for a parameter grid search. The interface to these classes is the same as that of classifiers in ScikitLearn so users familiar with ScikitLearn should have no trouble using this package. Below we will show some examples of using the GenSVM classifier and the GenSVMGridSearchCV class in practice.
In the examples we assume that we have loaded the iris dataset from ScikitLearn as follows:
>>> from sklearn.datasets import load_iris >>> from sklearn.model_selection import train_test_split >>> from sklearn.preprocessing import MaxAbsScaler >>> X, y = load_iris(return_X_y=True) >>> X_train, X_test, y_train, y_test = train_test_split(X, y) >>> scaler = MaxAbsScaler().fit(X_train) >>> X_train, X_test = scaler.transform(X_train), scaler.transform(X_test)
Note that we scale the data using the
MaxAbsScaler
function. This scales the columns of the data matrix to [1, 1]
without
breaking sparsity. Scaling the dataset can have a significant effect on the
computation time of GenSVM and is generally recommended for
SVMs.
Example 1: Fitting a single GenSVM model
Let's start by fitting the most basic GenSVM model on the training data:
>>> from gensvm import GenSVM >>> clf = GenSVM() >>> clf.fit(X_train, y_train) GenSVM(coef=0.0, degree=2.0, epsilon=1e06, gamma='auto', kappa=0.0, kernel='linear', kernel_eigen_cutoff=1e08, lmd=1e05, max_iter=100000000.0, p=1.0, random_state=None, verbose=0, weights='unit')
With the model fitted, we can predict the test dataset:
>>> y_pred = clf.predict(X_test)
Next, we can compute a score for the predictions. The GenSVM class has a
score
method which computes the
accuracy_score
for the predictions. In the GenSVM paper, the adjusted Rand
index is often
used to compare performance. We illustrate both options below (your results
may be different depending on the exact train/test split):
>>> clf.score(X_test, y_test) 1.0 >>> from sklearn.metrics import adjusted_rand_score >>> adjusted_rand_score(clf.predict(X_test), y_test) 1.0
We can try this again by changing the model parameters, for instance we can
turn on verbosity and use the Euclidean norm in the GenSVM model by setting p = 2
:
>>> clf2 = GenSVM(verbose=True, p=2) >>> clf2.fit(X_train, y_train) Starting main loop. Dataset: n = 112 m = 4 K = 3 Parameters: kappa = 0.000000 p = 2.000000 lambda = 0.0000100000000000 epsilon = 1e06 iter = 0, L = 3.4499531579689533, Lbar = 7.3369415851139745, reldiff = 1.1266786095824437 ... Optimization finished, iter = 4046, loss = 0.0230726364692517, rel. diff. = 0.0000009998645783 Number of support vectors: 9 GenSVM(coef=0.0, degree=2.0, epsilon=1e06, gamma='auto', kappa=0.0, kernel='linear', kernel_eigen_cutoff=1e08, lmd=1e05, max_iter=100000000.0, p=2, random_state=None, verbose=True, weights='unit')
For other parameters that can be tuned in the GenSVM model, see GenSVM.
Example 2: Fitting a GenSVM model with a "warm start"
One of the key features of the GenSVM classifier is that training can be
accelerated by using socalled "warmstarts". This way the optimization can be
started in a location that is closer to the final solution than a random
starting position would be. To support this, the fit
method of the GenSVM
class has an optional seed_V
parameter. We'll illustrate how this can be
used below.
We start with relatively large value for the epsilon
parameter in the
model. This is the stopping parameter that determines how long the
optimization continues (and therefore how exact the fit is).
>>> clf1 = GenSVM(epsilon=1e3) >>> clf1.fit(X_train, y_train) ... >>> clf1.n_iter_ 163
The n_iter_
attribute tells us how many iterations the model did. Now, we
can use the solution of this model to start the training for the next model:
>>> clf2 = GenSVM(epsilon=1e8) >>> clf2.fit(X_train, y_train, seed_V=clf1.combined_coef_) ... >>> clf2.n_iter_ 3196
Compare this to a model with the same stopping parameter, but without the warm start:
>>> clf2.fit(X_train, y_train) ... >>> clf2.n_iter_ 3699
So we saved about 500 iterations! This effect will be especially significant with large datasets and when you try out many parameter configurations. Therefore this technique is built into the GenSVMGridSearchCV class that can be used to do a grid search of parameters.
Example 3: Running a GenSVM grid search
Often when we're fitting a machine learning model such as GenSVM, we have to try several parameter configurations to figure out which one performs best on our given dataset. This is usually combined with cross validation to avoid overfitting. To do this efficiently and to make use of warm starts, the GenSVMGridSearchCV class is available. This class works in the same way as the GridSearchCV class of ScikitLearn, but uses the GenSVM C library for speed.
To do a grid search, we first have to define the parameters that we want to vary and what values we want to try:
>>> from gensvm import GenSVMGridSearchCV >>> param_grid = {'p': [1.0, 2.0], 'lmd': [1e8, 1e6, 1e4, 1e2, 1.0], 'kappa': [0.9, 0.0] }
For the values that are not varied in the parameter grid, the default values
will be used. This means that if you want to change a specific value (such as
epsilon
for instance), you can add this to the parameter grid as a
parameter with a single value to try (e.g. 'epsilon': [1e8]
).
Running the grid search is now straightforward:
>>> gg = GenSVMGridSearchCV(param_grid) >>> gg.fit(X_train, y_train) GenSVMGridSearchCV(cv=None, iid=True, param_grid={'p': [1.0, 2.0], 'lmd': [1e06, 0.0001, 0.01, 1.0], 'kappa': [0.9, 0.0]}, refit=True, return_train_score=True, scoring=None, verbose=0)
Note that if we have set refit=True
(the default), then we can use the
GenSVMGridSearchCV instance to predict or score using the best estimator
found in the grid search:
>>> y_pred = gg.predict(X_test) >>> gg.score(X_test, y_test) 1.0
A nice feature borrowed from ScikitLearn
_ is that the results from the grid
search can be represented as a pandas
DataFrame:
>>> from pandas import DataFrame >>> df = DataFrame(gg.cv_results_)
This can make it easier to explore the results of the grid search.
Known Limitations
The following are known limitations that are on the roadmap for a future release of the package. If you need any of these features, please vote on them on the linked GitHub issues (this can make us add them sooner!).
 Support for sparse matrices. NumPy supports sparse matrices, as does the GenSVM C library. Getting them to work together requires some additional effort. In the meantime, if you really want to use sparse data with GenSVM (this can lead to significant speedups!), check out the GenSVM C library.
 Specification of class misclassification weights. Currently, incorrectly classification an object from class A to class C is as bad as incorrectly classifying an object from class B to class C. Depending on the application, this may not be the desired effect. Adding class misclassification weights can solve this issue.
Questions and Issues
If you have any questions or encounter any issues with using this package, please ask them on GitHub.
License
This package is licensed under the GNU General Public License version 3.
Copyright (c) G.J.J. van den Burg, excluding the sections of the code that are explicitly marked to come from ScikitLearn.
Project details
Release history Release notifications  RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size  File type  Python version  Upload date  Hashes 

Filename, size gensvm0.2.6cp36cp36mmacosx_10_14_intel.whl (138.6 kB)  File type Wheel  Python version cp36  Upload date  Hashes View 
Filename, size gensvm0.2.6cp36cp36mmanylinux2010_i686.whl (3.5 MB)  File type Wheel  Python version cp36  Upload date  Hashes View 
Filename, size gensvm0.2.6cp36cp36mmanylinux2010_x86_64.whl (4.1 MB)  File type Wheel  Python version cp36  Upload date  Hashes View 
Filename, size gensvm0.2.6cp37cp37mmacosx_10_14_intel.whl (219.8 kB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size gensvm0.2.6cp37cp37mmanylinux2010_i686.whl (3.5 MB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size gensvm0.2.6cp37cp37mmanylinux2010_x86_64.whl (4.1 MB)  File type Wheel  Python version cp37  Upload date  Hashes View 
Filename, size gensvm0.2.6cp38cp38macosx_10_14_x86_64.whl (138.0 kB)  File type Wheel  Python version cp38  Upload date  Hashes View 
Filename, size gensvm0.2.6cp38cp38manylinux2010_i686.whl (3.5 MB)  File type Wheel  Python version cp38  Upload date  Hashes View 
Filename, size gensvm0.2.6cp38cp38manylinux2010_x86_64.whl (4.1 MB)  File type Wheel  Python version cp38  Upload date  Hashes View 
Filename, size gensvm0.2.6.tar.gz (188.1 kB)  File type Source  Python version None  Upload date  Hashes View 
Hashes for gensvm0.2.6cp36cp36mmacosx_10_14_intel.whl
Algorithm  Hash digest  

SHA256  c9e8f1c47de1c4b3368bd2038ca957dfcfafe47f6a7e74be6204536928a0cf4e 

MD5  60869f20f8a506ec5f3f4d45c1844427 

BLAKE2256  2747d2b6f0aaa777a7fde71bb7c97830d913f10a902e4b4bd1346b32d9c6f851 
Hashes for gensvm0.2.6cp36cp36mmanylinux2010_i686.whl
Algorithm  Hash digest  

SHA256  10dc079c820c418a210d244e492c5099ef86dba3200a89296c5463c4340d00cb 

MD5  3af259eb17ff5f6c67fab848e0c1c6bf 

BLAKE2256  c37b02697b464a09b2aa3b57ea601d3d5c3ea2d198216f3f7f554e99027b0d0d 
Hashes for gensvm0.2.6cp36cp36mmanylinux2010_x86_64.whl
Algorithm  Hash digest  

SHA256  7a956243d29d94409d4e2a11612bb38d5433ed37a76cc5d9b8ada223abdad537 

MD5  b9ad785ae1110e8585b214047b566216 

BLAKE2256  e6073199c73b39fc41abf3701f6fbd0ec4bbc3bec4533a782e8b1921ceff90cc 
Hashes for gensvm0.2.6cp37cp37mmacosx_10_14_intel.whl
Algorithm  Hash digest  

SHA256  c608163f5d884758fef0f6092e6486a4932d5c9252ce6bcab952a64bed359e2e 

MD5  a34f6ad394acf0a4e8dbe859e67adb85 

BLAKE2256  57fa800cccb1030192400b7b5c604b317c2a19716f3fea08a91589657e2d6897 
Hashes for gensvm0.2.6cp37cp37mmanylinux2010_i686.whl
Algorithm  Hash digest  

SHA256  76368f344ca8a064e63b45186ca887091d3b5640ffa66260ac11b9ea9c3c417a 

MD5  be3a25839d9390b64cab5a168f35f6c5 

BLAKE2256  0bf77ab23ac992af518299f1b50fbfcda4a5642b43634faf656b8f28ce5c49bd 
Hashes for gensvm0.2.6cp37cp37mmanylinux2010_x86_64.whl
Algorithm  Hash digest  

SHA256  8fd94a53e46bae71ae36c7595fb3bc2e09ab762da0152b9a389c963c5a04e110 

MD5  edc89f69eaeb282b1599a86484e7e52e 

BLAKE2256  21cf4847df87fe0588ff6f7ef29259969df82efa76c4034b08e2233cebfb29cd 
Hashes for gensvm0.2.6cp38cp38macosx_10_14_x86_64.whl
Algorithm  Hash digest  

SHA256  c9d795bf4dff7d2fa801c860b38f5aa2aab6cc12acafae68cf8f31e25372398c 

MD5  b89dd34aa97bba4394a3c885f773a86d 

BLAKE2256  de25f63b3491cff93540090f97f6306a38ac8c899a94888121a0a03ec10fe1a7 
Hashes for gensvm0.2.6cp38cp38manylinux2010_i686.whl
Algorithm  Hash digest  

SHA256  ad67a4efb83c2bde002dca46f50dc8b4f6fae613c6efdb9126d5fd48480873d6 

MD5  193fb666127dd538c7064f13e53ca5ca 

BLAKE2256  c4b72222b4628d3106cf2218eebb74a01799ffd0355311b3b2a57d2675e17b9f 
Hashes for gensvm0.2.6cp38cp38manylinux2010_x86_64.whl
Algorithm  Hash digest  

SHA256  59a8505c8a64842729dc144cf82a1f44af17d56db602e8da175e8930462093d5 

MD5  2fd98cdf4218747d7418899702a9103f 

BLAKE2256  012d8f73bf96b6f2cedd68ff0933d9c27e7ee5e2035cbb4a4ab139a6dfd8968b 