Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label output rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status

🔗 Important links: Documentation | Issue Tracker | Changelog | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

The BOOMER Algorithm

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms (see packages mlrl-common and mlrl-seco).

📖 References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper.

🔧 Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules.

Deliberate Loss Optimization

  • Decomposable or non-decomposable loss functions can be optimized in expectation.
  • L1 and L2 regularization can be used.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.

Different Prediction Strategies

  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.

Flexible Handling of Input Data

  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Fine-grained Control over Model Characteristics

  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Fine-grained control over the specificity/generality of rules is provided via hyperparameters.

Support for Post-Optimization and Pruning

  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.

⌚ Runtime and Memory Optimizations

In addition to the features mentioned above, several techniques that may speed up training or reduce the memory footprint are currently implemented.

Approximation Techniques

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification dataset to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.

Sparse Data Structures

  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some datasets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large datasets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large datasets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on datasets with many output variables.

Parallelization

  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

📚 Documentation

Our documentation provides an extensive user guide, as well as Python and C++ API references for developers. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

📜 License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mlrl_boomer-0.14.1-cp313-cp313-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.13Windows x86-64

mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.14.1-cp313-cp313-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

mlrl_boomer-0.14.1-cp312-cp312-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.12Windows x86-64

mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.14.1-cp312-cp312-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.14.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b11e31ee09abad4242b76b6330c1e48fa93c78477dcc03e7b23f7b6acda38d7a
MD5 c02c72bda6b2901f06e753391625466a
BLAKE2b-256 4fbc6b23029db95cecda61fe4857701f318d4df2efc3aa301916efff100282e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d0817ddd5b260500e51c06ccc6eaf8bef3dea12745ada371d5188d8905a262de
MD5 ad4d58d5b0ae03648442347cc84d54bd
BLAKE2b-256 8e7136377c22f61fdb1a9ae0d5ac26cf42fb05e95d3ada97d87dcf53711c7650

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 8bb2c0db76fbea63e951a8f3ffcebd039bd1bbc9d461a36657c5ccdb38f6d98e
MD5 0a183ff2df83970ebd3e78e263530b46
BLAKE2b-256 572ee235aaac57aa483d7123fb3e3604ad6ac8b582117fb4841e81d8311dac8a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2b8ba39572c4fd04abca97ef262e3d5920e0db0b9607ac14203f5dab55474ca7
MD5 0109e9b92594051f7ec81e12af20b7cf
BLAKE2b-256 4da8bf5ba7d81ac763c1674351462498e166d7dfa6dbe69df53028159cc7cfcd

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 3ffaac1718723e35a7cb6baa71d97c7ff364182bb0bd30ed80711a121f8e11c0
MD5 f89ea3ff4b6b6cf69dd3cdcc1136e76c
BLAKE2b-256 06ee121a5690f9aa947da328acbb883ad29ae3855f6c1e08196093d16a017da9

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8fcd9f4c567399f2702d551a18d0a9c5fce6a4cfe3d7bab2db2c565bc5056976
MD5 8fa70cc0018c6f3d6aa25deffd0daab8
BLAKE2b-256 c926e6a841cb1df06f420d7ea7b605c6c3c94df634978b7a153b941eb17d73b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 9b460adb9e84bddbcfe7bfe58b94c03380b3c64bc17e00b6a133a0be33d02fc2
MD5 91e7e194b78f2e53d5c4074b790e5636
BLAKE2b-256 3484ec10e42dfdbf1b327a58227da67bab9ce9e74a80961c1e50cc174e59ed27

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 281b1abc6aafd56a48e972edf3f1832ee12b27e13deb3a3660e5e89f4f06eae7
MD5 a22fc058d186a7cb6472f6c4bff45b63
BLAKE2b-256 ae216d0d927f2e961ae93e4f9ff6fea56368640f577b57487a7e23712248fcf6

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page