Skip to main content

No project description provided

Project description

lisbon

Release

lisbon aims to be a drop-in replacement for liblinear which scikit-learn leaverages for linear classification problems, currently only supports L2-regularised hinge loss for binary classification by solving the dual problem (routine 3). The APIs follow scikit-learn's liblinear wrapper and importing the Python library will monkey-patch scikit-learn's svm library to use lisbon for the supported calculation.

from sklearn import svm
import lisbon

and the following computations will use lisbon if supported. To switch back lisbon.unload() will swap back the original fit function.

Please see lisbon/__init__.py to see how the runtime patching is done and bench.py for an example.

Install from source if your platform does not support AVX2 instruction set as the PyPI packaged version assumes AVX2 support.

Installation

Install from PyPI

pip install lisbon

Install from source

  • Make sure you have the Rust toolchain rustc, cargo, rust-std installed. The quickest way to do it is curl https://sh.rustup.rs -sSf | sh -s
    • For a minimal installation: curl https://sh.rustup.rs -sSf | sh -s -- --profile minimal
  • With your desired Python environment, pip install maturin
  • Clone this repository and from lisbon's project root, run RUSTFLAGS='-C target-cpu=native' maturin develop --release will install lisbon as a package to your Python environment
    • Note that the RUSTFLAGS='-C target-cpu=native' environmental variable ensures that rustc compiles against your CPU's supported instruction sets to enable more SIMD optimisations (e.g. AVX2, FMA).
  • For dev/benchmark purposes, consider installing the packages listed in requirements-dev.txt

For Windows

To set the rustc flags on windows with powershell:

$Env:RUSTFLAGS = "-C target-cpu=native"
maturin develop --release

Limitations

lisbon's speed up comes from vector instruction sets hence some platforms are not supported if not built from source.

Currently, lisbon only supports L2 regularised hinge loss and does not support

  1. sample weights
  2. class weights
  3. different penalty C for labels
  4. multiclass classification

Deviations from the source implementation

  1. As with scikit-learn's modification, the order of labels are flipped to be consistent with the rest of the scikit-learn family
    • liblinear uses [+1, -1] ordering
    • scikit-learn uses [-1, +1] ordering
  2. Uses a MT19937 + tweaked Lemire post-processor to generate a random number within range

Why is lisbon faster

  • liblinear uses sparse matrix representation for the dot/norm operations, so scikit-learn needs to convert the dense numpy matrix to sparse first then pass to liblinear. lisbon uses the dense matrix directly as sparse represented data can be inefficient and prevents some SIMD optimisations.
  • By reading the numpy C array directly underneath there’s no need to copy/duplicate data which saves memory.
  • Specialised. Some array reads and computations are optimised away as we know what the values are for the L2-regularised hinge loss binary classification routine.

Ref

  1. 2-norm
  2. A Dual Coordinate Descent Method for Large-scale Linear SVM

License

This project is licensed under either of

at your option.

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in lisbon by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lisbon-0.1.0.tar.gz (14.0 kB view hashes)

Uploaded Source

Built Distributions

lisbon-0.1.0-cp310-none-win_amd64.whl (127.5 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

lisbon-0.1.0-cp310-cp310-manylinux_2_24_x86_64.whl (187.3 kB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.24+ x86-64

lisbon-0.1.0-cp310-cp310-macosx_10_7_x86_64.whl (176.0 kB view hashes)

Uploaded CPython 3.10 macOS 10.7+ x86-64

lisbon-0.1.0-cp39-none-win_amd64.whl (127.7 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

lisbon-0.1.0-cp39-cp39-manylinux_2_24_x86_64.whl (187.4 kB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.24+ x86-64

lisbon-0.1.0-cp39-cp39-macosx_10_7_x86_64.whl (176.1 kB view hashes)

Uploaded CPython 3.9 macOS 10.7+ x86-64

lisbon-0.1.0-cp38-none-win_amd64.whl (128.0 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

lisbon-0.1.0-cp38-cp38-manylinux_2_24_x86_64.whl (187.7 kB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.24+ x86-64

lisbon-0.1.0-cp38-cp38-macosx_10_7_x86_64.whl (175.9 kB view hashes)

Uploaded CPython 3.8 macOS 10.7+ x86-64

lisbon-0.1.0-cp37-none-win_amd64.whl (127.9 kB view hashes)

Uploaded CPython 3.7 Windows x86-64

lisbon-0.1.0-cp37-cp37m-manylinux_2_24_x86_64.whl (187.3 kB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.24+ x86-64

lisbon-0.1.0-cp37-cp37m-macosx_10_7_x86_64.whl (175.5 kB view hashes)

Uploaded CPython 3.7m macOS 10.7+ x86-64

lisbon-0.1.0-cp36-none-win_amd64.whl (127.7 kB view hashes)

Uploaded CPython 3.6 Windows x86-64

lisbon-0.1.0-cp36-cp36m-manylinux_2_24_x86_64.whl (186.9 kB view hashes)

Uploaded CPython 3.6m manylinux: glibc 2.24+ x86-64

lisbon-0.1.0-cp36-cp36m-macosx_10_7_x86_64.whl (176.0 kB view hashes)

Uploaded CPython 3.6m macOS 10.7+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page