Skip to main content

Python binding of multi-core LIBLINEAR

Project description

------------------------------------------------
--- Multi-core Python Interface of LIBLINEAR ---
------------------------------------------------

Table of Contents
=================

- Introduction
- Installation via PyPI
- Installation via Sources
- Quick Start
- Quick Start with Scipy
- Design Description
- Data Structures
- Utility Functions
- Additional Information

Introduction
============

Python (http://www.python.org/) is a programming language suitable for rapid
development. This tool provides a simple Python interface to multi-core LIBLINEAR, a library
for support vector machines in multi-core machines (http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multicore-liblinear). The
interface is very easy to use as the usage is the same as that of LIBLINEAR. The
interface is developed with the built-in Python library "ctypes."

Details of multi-core implementations can be found at http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multicore-liblinear

Installation via PyPI
=====================

To install the interface from PyPI, execute the following command:

> pip install -U liblinear-multicore

Installation via Sources
========================

Alternatively, you may install the interface from sources by
generating the LIBLINEAR shared library.

Depending on your use cases, you can choose between local-directory
and system-wide installation.

- Local-directory installation:

On Unix systems, type

> make

This generates a .so file in the LIBLINEAR main directory and you
can run the interface in the current python directory.

- System-wide installation:

Type

> pip install -e .

or

> pip install --user -e .

The option --user would install the package in the home directory
instead of the system directory, and thus does not require the
root privilege.

Please note that you must keep the sources after the installation.

In addition, DON'T use the following FAILED commands

> python setup.py install (failed to run at the python directory)
> pip install .

Quick Start
===========

"Quick Start with Scipy" is in the next section.

There are two levels of usage. The high-level one uses utility
functions in liblinearutil.py and commonutil.py (shared with LIBSVM
and imported by svmutil.py). The usage is the same as the LIBLINEAR
MATLAB interface.

Specify the option '-m nr_thread' to use nr_thread threads for parallelizing solvers
(only for -s 0, -s 1, -s 2, -s 3, -s 5, -s 6, and -s 11)

>>> from liblinear.liblinearutil import *
# Read data in LIBSVM format
>>> y, x = svm_read_problem('../heart_scale')
>>> m = train(y[:200], x[:200], '-c 4 -m 8')
>>> p_label, p_acc, p_val = predict(y[200:], x[200:], m)

# Construct problem in python format
# Dense data
>>> y, x = [1,-1], [[1,0,1], [-1,0,-1]]
# Sparse data
>>> y, x = [1,-1], [{1:1, 3:1}, {1:-1,3:-1}]
>>> prob = problem(y, x)
>>> param = parameter('-s 0 -c 4 -B 1')
>>> m = train(prob, param)

# Other utility functions
>>> save_model('heart_scale.model', m)
>>> m = load_model('heart_scale.model')
>>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
>>> ACC, MSE, SCC = evaluations(y, p_label)

# Getting online help
>>> help(train)

The low-level use directly calls C interfaces imported by liblinear.py. Note that
all arguments and return values are in ctypes format. You need to handle them
carefully.

>>> from liblinear.liblinear import *
>>> prob = problem([1,-1], [{1:1, 3:1}, {1:-1,3:-1}])
>>> param = parameter('-c 4')
>>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
# Convert a Python-format instance to feature_nodearray, a ctypes structure
>>> x0, max_idx = gen_feature_nodearray({1:1, 3:1})
>>> label = liblinear.predict(m, x0)

Quick Start with Scipy
======================

Make sure you have Scipy installed to proceed in this section.
If numba (http://numba.pydata.org) is installed, some operations will be much faster.

There are two levels of usage. The high-level one uses utility functions
in liblinearutil.py and the usage is the same as the LIBLINEAR MATLAB interface.

>>> import scipy
>>> from liblinear.liblinearutil import *
# Read data in LIBSVM format
>>> y, x = svm_read_problem('../heart_scale', return_scipy = True) # y: ndarray, x: csr_matrix
>>> m = train(y[:200], x[:200, :], '-c 4')
>>> p_label, p_acc, p_val = predict(y[200:], x[200:, :], m)

# Construct problem in Scipy format
# Dense data: numpy ndarray
>>> y, x = scipy.asarray([1,-1]), scipy.asarray([[1,0,1], [-1,0,-1]])
# Sparse data: scipy csr_matrix((data, (row_ind, col_ind))
>>> y, x = scipy.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2])))
>>> prob = problem(y, x)
>>> param = parameter('-s 0 -c 4 -B 1')
>>> m = train(prob, param)

# Apply data scaling in Scipy format
>>> y, x = svm_read_problem('../heart_scale', return_scipy=True)
>>> scale_param = csr_find_scale_param(x, lower=0)
>>> scaled_x = csr_scale(x, scale_param)

# Other utility functions
>>> save_model('heart_scale.model', m)
>>> m = load_model('heart_scale.model')
>>> p_label, p_acc, p_val = predict(y, x, m, '-b 1')
>>> ACC, MSE, SCC = evaluations(y, p_label)

# Getting online help
>>> help(train)

The low-level use directly calls C interfaces imported by liblinear.py. Note that
all arguments and return values are in ctypes format. You need to handle them
carefully.

>>> from liblinear.liblinear import *
>>> prob = problem(scipy.asarray([1,-1]), scipy.sparse.csr_matrix(([1, 1, -1, -1], ([0, 0, 1, 1], [0, 2, 0, 2]))))
>>> param = parameter('-c 4')
>>> m = liblinear.train(prob, param) # m is a ctype pointer to a model
# Convert a tuple of ndarray (index, data) to feature_nodearray, a ctypes structure
# Note that index starts from 0, though the following example will be changed to 1:1, 3:1 internally
>>> x0, max_idx = gen_feature_nodearray((scipy.asarray([0,2]), scipy.asarray([1,1])))
>>> label = liblinear.predict(m, x0)

Design Description
==================

There are two files liblinear.py and liblinearutil.py, which respectively correspond to
low-level and high-level use of the interface.

In liblinear.py, we adopt the Python built-in library "ctypes," so that
Python can directly access C structures and interface functions defined
in linear.h.

While advanced users can use structures/functions in liblinear.py, to
avoid handling ctypes structures, in liblinearutil.py we provide some easy-to-use
functions. The usage is similar to LIBLINEAR MATLAB interface.

Data Structures
===============

Three data structures derived from linear.h are node, problem, and
parameter. They all contain fields with the same names in
linear.h. Access these fields carefully because you directly use a C structure
instead of a Python object. The following description introduces additional
fields and methods.

Before using the data structures, execute the following command to load the
LIBLINEAR shared library:

>>> from liblinear.liblinear import *

- class feature_node:

Construct a feature_node.

>>> node = feature_node(idx, val)

idx: an integer indicates the feature index.

val: a float indicates the feature value.

Show the index and the value of a node.

>>> print(node)

- Function: gen_feature_nodearray(xi [,feature_max=None])

Generate a feature vector from a Python list/tuple/dictionary, numpy ndarray or tuple of (index, data):

>>> xi_ctype, max_idx = gen_feature_nodearray({1:1, 3:1, 5:-2})

xi_ctype: the returned feature_nodearray (a ctypes structure)

max_idx: the maximal feature index of xi

feature_max: if feature_max is assigned, features with indices larger than
feature_max are removed.

- class problem:

Construct a problem instance

>>> prob = problem(y, x [,bias=-1])

y: a Python list/tuple/ndarray of l labels (type must be int/double).

x: 1. a list/tuple of l training instances. Feature vector of
each training instance is a list/tuple or dictionary.

2. an l * n numpy ndarray or scipy spmatrix (n: number of features).

bias: if bias >= 0, instance x becomes [x; bias]; if < 0, no bias term
added (default -1)

You can also modify the bias value by

>>> prob.set_bias(1)

Note that if your x contains sparse data (i.e., dictionary), the internal
ctypes data format is still sparse.

- class parameter:

Construct a parameter instance

>>> param = parameter('training_options')

If 'training_options' is empty, LIBLINEAR default values are applied.

Set param to LIBLINEAR default values.

>>> param.set_to_default_values()

Parse a string of options.

>>> param.parse_options('training_options')

Show values of parameters.

>>> print(param)

- class model:

There are two ways to obtain an instance of model:

>>> model_ = train(y, x)
>>> model_ = load_model('model_file_name')

Note that the returned structure of interface functions
liblinear.train and liblinear.load_model is a ctypes pointer of
model, which is different from the model object returned
by train and load_model in liblinearutil.py. We provide a
function toPyModel for the conversion:

>>> model_ptr = liblinear.train(prob, param)
>>> model_ = toPyModel(model_ptr)

If you obtain a model in a way other than the above approaches,
handle it carefully to avoid memory leak or segmentation fault.

Some interface functions to access LIBLINEAR models are wrapped as
members of the class model:

>>> nr_feature = model_.get_nr_feature()
>>> nr_class = model_.get_nr_class()
>>> class_labels = model_.get_labels()
>>> is_prob_model = model_.is_probability_model()
>>> is_regression_model = model_.is_regression_model()

The decision function is W*x + b, where
W is an nr_class-by-nr_feature matrix, and
b is a vector of size nr_class.
To access W_kj (i.e., coefficient for the k-th class and the j-th feature)
and b_k (i.e., bias for the k-th class), use the following functions.

>>> W_kj = model_.get_decfun_coef(feat_idx=j, label_idx=k)
>>> b_k = model_.get_decfun_bias(label_idx=k)

We also provide a function to extract w_k (i.e., the k-th row of W) and
b_k directly as follows.

>>> [w_k, b_k] = model_.get_decfun(label_idx=k)

Note that w_k is a Python list of length nr_feature, which means that
w_k[0] = W_k1.
For regression models, W is just a vector of length nr_feature. Either
set label_idx=0 or omit the label_idx parameter to access the coefficients.

>>> W_j = model_.get_decfun_coef(feat_idx=j)
>>> b = model_.get_decfun_bias()
>>> [W, b] = model_.get_decfun()

For one-class SVM models, label_idx is ignored and b=-rho is
returned from get_decfun(). That is, the decision function is
w*x+b = w*x-rho.

>>> rho = model_.get_decfun_rho()
>>> [W, b] = model_.get_decfun()

Note that in get_decfun_coef, get_decfun_bias, and get_decfun, feat_idx
starts from 1, while label_idx starts from 0. If label_idx is not in the
valid range (0 to nr_class-1), then a NaN will be returned; and if feat_idx
is not in the valid range (1 to nr_feature), then a zero value will be
returned. For regression models, label_idx is ignored.

Utility Functions
=================

To use utility functions, type

>>> from liblinear.liblinearutil import *

The above command loads
train() : train a linear model
predict() : predict testing data
svm_read_problem() : read the data from a LIBSVM-format file.
load_model() : load a LIBLINEAR model.
save_model() : save model to a file.
evaluations() : evaluate prediction results.

- Function: train

There are three ways to call train()

>>> model = train(y, x [, 'training_options'])
>>> model = train(prob [, 'training_options'])
>>> model = train(prob, param)

y: a list/tuple/ndarray of l training labels (type must be int/double).

x: 1. a list/tuple of l training instances. Feature vector of
each training instance is a list/tuple or dictionary.

2. an l * n numpy ndarray or scipy spmatrix (n: number of features).

training_options: a string in the same form as that for LIBLINEAR command
mode.

prob: a problem instance generated by calling
problem(y, x).

param: a parameter instance generated by calling
parameter('training_options')

model: the returned model instance. See linear.h for details of this
structure. If '-v' is specified, cross validation is
conducted and the returned model is just a scalar: cross-validation
accuracy for classification and mean-squared error for regression.

If the '-C' option is specified, best parameters are found
by cross validation. The parameter selection utility is supported
only by -s 0, -s 2 (for finding C) and -s 11 (for finding C, p).
The returned structure is a triple with the best C, the best p,
and the corresponding cross-validation accuracy or mean squared
error. The returned best p for -s 0 and -s 2 is set to -1 because
the p parameter is not used by classification models.


To train the same data many times with different
parameters, the second and the third ways should be faster..

Examples:

>>> y, x = svm_read_problem('../heart_scale')
>>> prob = problem(y, x)
>>> param = parameter('-s 3 -c 5 -q')
>>> m = train(y, x, '-c 5')
>>> m = train(prob, '-w1 5 -c 5')
>>> m = train(prob, param)
>>> CV_ACC = train(y, x, '-v 3')
>>> best_C, best_p, best_rate = train(y, x, '-C -s 0') # best_p is only for -s 11
>>> m = train(y, x, '-c {0} -s 0'.format(best_C)) # use the same solver: -s 0

- Function: predict

To predict testing data with a model, use

>>> p_labs, p_acc, p_vals = predict(y, x, model [,'predicting_options'])

y: a list/tuple/ndarray of l true labels (type must be int/double).
It is used for calculating the accuracy. Use [] if true labels are
unavailable.

x: 1. a list/tuple of l training instances. Feature vector of
each training instance is a list/tuple or dictionary.

2. an l * n numpy ndarray or scipy spmatrix (n: number of features).

predicting_options: a string of predicting options in the same format as
that of LIBLINEAR.

model: a model instance.

p_labels: a list of predicted labels

p_acc: a tuple including accuracy (for classification), mean
squared error, and squared correlation coefficient (for
regression).

p_vals: a list of decision values or probability estimates (if '-b 1'
is specified). If k is the number of classes, for decision values,
each element includes results of predicting k binary-class
SVMs. If k = 2 and solver is not MCSVM_CS, only one decision value
is returned. For probabilities, each element contains k values
indicating the probability that the testing instance is in each class.
Note that the order of classes here is the same as 'model.label'
field in the model structure.

Example:

>>> m = train(y, x, '-c 5')
>>> p_labels, p_acc, p_vals = predict(y, x, m)

- Functions: svm_read_problem/load_model/save_model

See the usage by examples:

>>> y, x = svm_read_problem('data.txt')
>>> m = load_model('model_file')
>>> save_model('model_file', m)

- Function: evaluations

Calculate some evaluations using the true values (ty) and the predicted
values (pv):

>>> (ACC, MSE, SCC) = evaluations(ty, pv, useScipy)

ty: a list/tuple/ndarray of true values.

pv: a list/tuple/ndarray of predicted values.

useScipy: convert ty, pv to ndarray, and use scipy functions to do the evaluation

ACC: accuracy.

MSE: mean squared error.

SCC: squared correlation coefficient.

- Function: csr_find_scale_parameter/csr_scale

Scale data in csr format.

>>> param = csr_find_scale_param(x [, lower=l, upper=u])
>>> x = csr_scale(x, param)

x: a csr_matrix of data.

l: x scaling lower limit; default -1.

u: x scaling upper limit; default 1.

The scaling process is: x * diag(coef) + ones(l, 1) * offset'

param: a dictionary of scaling parameters, where param['coef'] = coef and param['offset'] = offset.

coef: a scipy array of scaling coefficients.

offset: a scipy array of scaling offsets.

Additional Information
======================

This interface was originally written by Hsiang-Fu Yu from Department of Computer
Science, National Taiwan University. If you find this tool useful, please
cite LIBLINEAR as follows

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin.
LIBLINEAR: A Library for Large Linear Classification, Journal of
Machine Learning Research 9(2008), 1871-1874. Software available at
http://www.csie.ntu.edu.tw/~cjlin/liblinear

For any question, please contact Chih-Jen Lin <cjlin@csie.ntu.edu.tw>,
or check the following pages:

http://www.csie.ntu.edu.tw/~cjlin/liblinear/faq.html
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/multicore-liblinear


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

liblinear-multicore-2.44.0.tar.gz (47.3 kB view details)

Uploaded Source

Built Distributions

liblinear_multicore-2.44.0-cp310-cp310-win_amd64.whl (63.0 kB view details)

Uploaded CPython 3.10 Windows x86-64

liblinear_multicore-2.44.0-cp39-cp39-win_amd64.whl (63.0 kB view details)

Uploaded CPython 3.9 Windows x86-64

liblinear_multicore-2.44.0-cp38-cp38-win_amd64.whl (63.0 kB view details)

Uploaded CPython 3.8 Windows x86-64

liblinear_multicore-2.44.0-cp37-cp37m-win_amd64.whl (63.0 kB view details)

Uploaded CPython 3.7m Windows x86-64

liblinear_multicore-2.44.0-cp36-cp36m-win_amd64.whl (63.0 kB view details)

Uploaded CPython 3.6m Windows x86-64

File details

Details for the file liblinear-multicore-2.44.0.tar.gz.

File metadata

  • Download URL: liblinear-multicore-2.44.0.tar.gz
  • Upload date:
  • Size: 47.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.9.7

File hashes

Hashes for liblinear-multicore-2.44.0.tar.gz
Algorithm Hash digest
SHA256 c13fe59839a97da9dfba143226c685ebd2fe5ff794f4738ff8deafbbc473dad2
MD5 d44a974fb5af421efe841b9973354f22
BLAKE2b-256 45ddbd5f75f987179f61adcd11c7946703035e059c06038956a7b8df0cfea78f

See more details on using hashes here.

File details

Details for the file liblinear_multicore-2.44.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: liblinear_multicore-2.44.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 63.0 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.9.7

File hashes

Hashes for liblinear_multicore-2.44.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 d048cdc7d351eeb21077be2d27cb48c68d13944fb524f06425f7fb20aa46d138
MD5 fff6d50d6b57c6a60f1e3615fbb0966d
BLAKE2b-256 8f9554c5da86fe031be5a1342d4484401d1298c6a53171bc76e38a2e912afab8

See more details on using hashes here.

File details

Details for the file liblinear_multicore-2.44.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: liblinear_multicore-2.44.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 63.0 kB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.9.7

File hashes

Hashes for liblinear_multicore-2.44.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 b33a0c8a82e687a8430c60cfb6138164882020b10cdf7c82b4856cf7f3e5e28d
MD5 9860cee379d86d2bb950a2a29c69cc77
BLAKE2b-256 a80c2441f801133fe0b2e04e29db46d4f0e84c01da7da520cb9c63e245223b5d

See more details on using hashes here.

File details

Details for the file liblinear_multicore-2.44.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: liblinear_multicore-2.44.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 63.0 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.9.7

File hashes

Hashes for liblinear_multicore-2.44.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 88f4e3ecb474c9a443500b340ea4a2cd58836197e13d8166195360576b6a67c5
MD5 ec9a1a7495ea4d74c4be5029b6e70c9b
BLAKE2b-256 a816e56b810da48bdb8cb2fc7f826111ef16db629c82f51a354714e9e37e9c64

See more details on using hashes here.

File details

Details for the file liblinear_multicore-2.44.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: liblinear_multicore-2.44.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 63.0 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.9.7

File hashes

Hashes for liblinear_multicore-2.44.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 ece44b2072826e88616b96ef41002aec2816f72b596184c25394befc1a2b9efe
MD5 d6873e72b84e9dcc8e86396b5dc38a45
BLAKE2b-256 b29ca1f300a5f9dfb012a87ae780fd68f4039591d4e2d3c2d2b682de4ac442e8

See more details on using hashes here.

File details

Details for the file liblinear_multicore-2.44.0-cp36-cp36m-win_amd64.whl.

File metadata

  • Download URL: liblinear_multicore-2.44.0-cp36-cp36m-win_amd64.whl
  • Upload date:
  • Size: 63.0 kB
  • Tags: CPython 3.6m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.63.0 CPython/3.9.7

File hashes

Hashes for liblinear_multicore-2.44.0-cp36-cp36m-win_amd64.whl
Algorithm Hash digest
SHA256 daf1af7e16a5ae290593afea181e2d8c2c30203f2c93ddf0a042e6172295ae9e
MD5 1eec05734d43c306ee7711f02e30b668
BLAKE2b-256 1e35afe1b0cd1e932f8ad712c9b2ab34d5b38b526d870d33cf5f2781aad7634a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page