Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status X URL

Important links: Documentation | Issue Tracker | Changelog | Contributors | Code of Conduct | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules that integrates with the popular scikit-learn machine learning framework.

The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses gradient boosting to learn an ensemble of rules that is built with respect to a given multivariate loss function.

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms. For example, this project does also provide a Separate-and-Conquer (SeCo) algorithm based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.

References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section References of the documentation.

Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification rules:

  • Label-wise decomposable or non-decomposable loss functions can be minimized in expectation.
  • L1 and L2 regularization can be used.
  • Single-label, partial, or complete heads can be used by rules, i.e., they can predict for an individual label, a subset of the available labels, or all labels. Predicting for multiple labels simultaneously enables rules to model local dependencies between labels.
  • Various strategies for predicting regression scores, labels or probabilities are available.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a model.
  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Sampling techniques and stratification methods can be used to learn new rules on a subset of the available training examples, features, or labels.
  • Shrinkage (a.k.a. the learning rate) can be adjusted to control the impact of individual rules on the overall ensemble.
  • Fine-grained control over the specificity/generality of rules is provided via hyper-parameters.
  • Incremental reduced error pruning can be used to remove overly specific conditions from rules and prevent overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help to improve the predictive performance of a model by reconstructing each rule in the context of the other rules.
  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Runtime and Memory Optimizations

In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Gradient-based label binning (GBLB) can be used to assign the available labels to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some data sets.
  • Sparse label matrices can be used for training. This may reduce the memory footprint in case of large data sets.
  • Sparse prediction matrices can be used to store predicted labels. This may reduce the memory footprint in case of large data sets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on data sets with many labels.
  • Multi-threading can be used to parallelize the evaluation of a rule's potential refinements across several features, to update the gradients and Hessians of individual examples in parallel, or to obtain predictions for several examples in parallel.

Documentation

An extensive user guide, as well as an API documentation for developers, is available at https://mlrl-boomer.readthedocs.io. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mlrl_boomer-0.10.1-cp312-cp312-win_amd64.whl (754.5 kB view details)

Uploaded CPython 3.12 Windows x86-64

mlrl_boomer-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.1-cp312-cp312-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

mlrl_boomer-0.10.1-cp311-cp311-win_amd64.whl (754.0 kB view details)

Uploaded CPython 3.11 Windows x86-64

mlrl_boomer-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.1-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

mlrl_boomer-0.10.1-cp310-cp310-win_amd64.whl (756.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

mlrl_boomer-0.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.1-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

mlrl_boomer-0.10.1-cp39-cp39-win_amd64.whl (762.7 kB view details)

Uploaded CPython 3.9 Windows x86-64

mlrl_boomer-0.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.1-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.10.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 e4d9895d2d905b7801ac4a08daa28c0bda86fbd7aa806228480994642c8d53c9
MD5 f0a636d29f81d0a6cd07b952248c903e
BLAKE2b-256 73d516f1029e29c2fe33f2d12aa54d9ea30a20262d9b781915311b719b394a05

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0f0c7c77325cf24b29cd513db613c573de474aaad410438f11d82b727a0a8fa2
MD5 f3fdcb87b1fe590e79d7b375d45c9390
BLAKE2b-256 1cb1265af3f2a078da9617a95548692fe50fc563b92a5cb0cd48396c88bfc7ab

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0c77822f0a108b47d336ee3ee28788a937031882f654941d49827bd45fadaca6
MD5 d6970e2d0478a8b51ed47602d8bc47a3
BLAKE2b-256 9e25d80d3b7af6c42963925701199595f5bae0f38d7c68105d8f7fe0e3900dc9

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 75106fde6a49aed0c1e700bc06efdb294c9cf8d5cb457c41e789231a97aeded6
MD5 e15f3312eb8dd40a3d7eac9d2832f964
BLAKE2b-256 53d5e340db2e1c10bceb9b1ac9e9f21f717b92c7ec32d922df0d7caf74e0ed8d

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 995b21755929465b3a75f26af690f551d102694665752381b0da769b51d199a0
MD5 3f1a6fee832252ec3450f9b133da31d9
BLAKE2b-256 ba6794653c38e003e63b300b0419694439b62ac1a302372db303e05a0210a33c

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5d9cacd8088f7eac29e053b735e58e148da186b1dfe5fe6e5a4f122a12e50f73
MD5 0cbf1d2f1a2c943441291b275ad40c85
BLAKE2b-256 ca68d33bac0b2790e1af7a70714292572862008e6db0b827d89e6fb54d3a548f

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a0e68ea7a2c2b49887c7c25d31fc391bd03e7d75fa98afb15ae1e6e175ca82fb
MD5 b298b8cbc3a2f4ffa74dd093bc3643ed
BLAKE2b-256 3e43783bd048ab4424a4ecf3aca110ae2f1caf72917d47b2a2010a8f83f938c3

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4594e2efb900d608fc6135f7f82f4bd0d969f0afe07b4ab81fa1e0b1672a3265
MD5 7f3eea153256599f3cd5f0fd72e4de4d
BLAKE2b-256 e7db4ab55ca8b4d67163b438f24b7c1cb2fc248a586f08a52fdf67aa4b231a89

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 24591798508adc24db8db3877175d78ebbb499c34e5cb4c4faf23a49efa6107c
MD5 bb79c7c74a14e146750d2fd623897602
BLAKE2b-256 1685ce0aa324cd1ddfa56b2f37e70492e5c8f12588a922c98cd0525b3cba8563

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 826ad2de19ef67936436986dfd2a2b26bba8d0650990b971fd6d9e2001388a29
MD5 d670653b140a947c40d6d6e2c5a7670e
BLAKE2b-256 48359f1c35b984d56265d12c864a573ffbc381997bf04af0bab585e6311fb1c9

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f6bbecd673228132138232dc3429214d501523d5938b3569a87c35436c705cb7
MD5 0b507f2b19e3ab3d2279191a8b0f62fa
BLAKE2b-256 97aa84946119861d34996d04606341d56a3926edb74d334ade25997a4cdc20ff

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5f32add19250db387aa599c3d31a60b5f7b6cf1c18b42ef62fa601a938c1a4d6
MD5 c9a96bc9f84fd6968f5de9d245a1b2ee
BLAKE2b-256 1be06f59bded8f373aa0c7c9c2ceb5a058dc44efd9935ff06089de6155fb4481

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 157e43282f788cb26a65550e89a437aabc88c12a26a401d91ff47a9eff0d1e0c
MD5 c776388f46e939e5ac500a0340bf19dc
BLAKE2b-256 9c7b6c63f7604c0968b7f2addd8dc7e3945be66be48938e9eaa457e7ccb1f906

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 357104e771c69101781f9fd6a25af638d5d444b1e2855ddaf09f4af13b36abd1
MD5 f0fe90cf1f5f2be06e86b9b379905407
BLAKE2b-256 741b3d67205785c97d6ccc9fd86678103fca71c960d403fe3473f55d82abe1bb

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 51ad534fb7741882f18f1c1c492beb37d2d2c8859c456d61ffee6929d6a475e6
MD5 8dd26f7306f023cc93ab4a06aab2f08e
BLAKE2b-256 69d51d884dbc91508f81ba337a01c40ae8bcd4c757ffde1cfa2b13f5b28e1a0c

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3ff067b4d75a3cf3e2b7c1501718cdfd8064199e57ac8acd1336bcda3c9ee856
MD5 0bb9d293b923e8d31fb55f16b095aaab
BLAKE2b-256 e9b1cb36a0140ea6331f0899946ff8467d0636a886191897da14b31975b8ae58

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page