Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status Twitter URL

Important links: Documentation | Issue Tracker | Changelog | Contributors | Code of Conduct

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules that integrates with the popular scikit-learn machine learning framework.

The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses gradient boosting to learn an ensemble of rules that is built with respect to a given multivariate loss function. To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. To ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements.

References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section "References" of the documentation.

Functionalities

The algorithm that is provided by this project currently supports the following core functionalities to learn an ensemble of boosted classification rules:

  • Label-wise decomposable or non-decomposable loss functions can be minimized in expectation.
  • L1 and L2 regularization can be used.
  • Single-label, partial, or complete heads can be used by rules, i.e., they can predict for an individual label, a subset of the available labels, or all labels. Predicting for multiple labels simultaneously enables rules to model local dependencies between labels.
  • Various strategies for predicting regression scores, labels or probabilities are available.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a model.
  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Sampling techniques and stratification methods can be used to learn new rules on a subset of the available training examples, features, or labels.
  • Shrinkage (a.k.a. the learning rate) can be adjusted to control the impact of individual rules on the overall ensemble.
  • Fine-grained control over the specificity/generality of rules is provided via hyper-parameters.
  • Incremental reduced error pruning can be used to remove overly specific conditions from rules and prevent overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help to improve the predictive performance of a model by reconstructing each rule in the context of the other rules.
  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Runtime and Memory Optimizations

In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Gradient-based label binning (GBLB) can be used to assign the available labels to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some data sets.
  • Sparse label matrices can be used for training. This may reduce the memory footprint in case of large data sets.
  • Sparse prediction matrices can be used to store predicted labels. This may reduce the memory footprint in case of large data sets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on data sets with many labels.
  • Multi-threading can be used to parallelize the evaluation of a rule's potential refinements across several features, to update the gradients and Hessians of individual examples in parallel, or to obtain predictions for several examples in parallel.

Documentation

An extensive user guide, as well as an API documentation for developers, is available at https://mlrl-boomer.readthedocs.io. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mlrl_boomer-0.9.0-cp310-cp310-win_amd64.whl (570.9 kB view details)

Uploaded CPython 3.10 Windows x86-64

mlrl_boomer-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.9.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.6 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.9.0-cp310-cp310-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

mlrl_boomer-0.9.0-cp39-cp39-win_amd64.whl (580.7 kB view details)

Uploaded CPython 3.9 Windows x86-64

mlrl_boomer-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.9.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.4 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.9.0-cp39-cp39-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

mlrl_boomer-0.9.0-cp38-cp38-win_amd64.whl (579.2 kB view details)

Uploaded CPython 3.8 Windows x86-64

mlrl_boomer-0.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.9.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.9.0-cp38-cp38-macosx_10_9_x86_64.whl (2.2 MB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

File details

Details for the file mlrl_boomer-0.9.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 bbe57d12b64dbd3dfe3fdef9d738516cefbdea38536ee50ae27615157a70a1c5
MD5 4c2574f6f79516b15117be5a485b9085
BLAKE2b-256 926b40199ccc65a88b030b3cb5f0c1c899ee6baef553efd0bb552fbeae605d4e

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c775031c7b7953d5545c32beddceef08c2e0830b931f283718a6aa9045e0c2a9
MD5 f3c3652fcaf0d6174202195e2165eb50
BLAKE2b-256 263e35b10692c03378d50cc96d62365f478138a56532d48be71d5b7a4bc965b5

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 27dd7e5c899741f58737de3b67603b4e83a659d7a8b8585fa2590afce8f461e3
MD5 9e9ad3337fff684c273f62ba1ec0a58f
BLAKE2b-256 9a686436334ffb51b1510afe13b3f5fb5f96a558d341806da0bf2b6dab802c80

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 907ab548988d857711d361d0f135d82ced5ff6c5edb9afb069c058979146f52b
MD5 7430dbecff26c7ee371533498bc4662e
BLAKE2b-256 a3e878a99a0a807dc055cbf31ff38ad668b61c6f0eed78300f2131f24531c686

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 d8ac9964de1660b7405ec85c19558cc2696668ab5e617899538a20d5d945db0a
MD5 83b36e9a853089002f11d3829a6ff646
BLAKE2b-256 55fd1cdc28eb1e54ba77968281518f5845386ae632573c7b8c3f19391e4d0ef1

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8671479e8327f77979f4bdf36cdef42fcda8048aa1eb0a4a5b29a3443355ce40
MD5 4ff41a3ec7bf05bfe7bf98de23601a41
BLAKE2b-256 34b7f2d294e86c1ec89ecafbf456b7ea6f8845432ec05e07d9335fc337c1982d

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e6a7459f406e8ae998186eb652b91760209c0203aa9a562b70ae7cb9bd83e88b
MD5 94b906b45ef4b0c0a5e4d37a9f4f000e
BLAKE2b-256 b6c17a7532adf53063507b49e7838c8e968df557d057aa090689c89e44697d20

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a724da29ef28ec63077fca3b06a23546e211f0e80011b2f143c11fd67a2ef419
MD5 d599f582bee666af65e86cf9587eee41
BLAKE2b-256 842835bcdc8b5f6d8cec137512f3fbfa122660c2e7431616d13402827a2029f1

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 8c13fb91ae928d513f709b8898cc9a65b50ed6eabd81bdd6c31cee690ad8fd77
MD5 cb4c0b2eb5beb932009c6e19dbd69341
BLAKE2b-256 444f5e74a49a34c91184adbd84ba6360e8a0c1b1e7f472495e6222ead4c2cd9d

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9658aa2c3a7e5a263f125efe9414914f75da1987c375ef1fe7125cb0b4a8ba60
MD5 23cc38fcf38b8b19b8f337717d4f1c69
BLAKE2b-256 2b7f8b8fc252b1ecd49c3f5fc2e65e200090935de8c0c04fd81e1bd8c8695ecf

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 b2d95f176c97eebb915518dfd430eb8aa926d3ae8adf72cbb062ee3d0e23873c
MD5 4b6f85b0580291eef4d88a9c2d49d663
BLAKE2b-256 04c82a337ac8da2f306f830b639858de28607cf0a8004aa5d1ae30110df4678b

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.9.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.9.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 4d5d96c9179a6a25d75b530635c5e6c637a763eb5d065082710c6d697815ce1e
MD5 fad486f76e1da3afdd65cf5eefab6ad2
BLAKE2b-256 d8725802f17fcfeddbb7bbab2e61a6b034f5f80596d53c42b09d6dba1fc6c926

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page