Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label output rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status

🔗 Important links: Documentation | Issue Tracker | Changelog | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

The BOOMER Algorithm

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms (see packages mlrl-common and mlrl-seco).

📖 References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper.

🔧 Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules.

Deliberate Loss Optimization

  • Decomposable or non-decomposable loss functions can be optimized in expectation.
  • L1 and L2 regularization can be used.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.

Different Prediction Strategies

  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.

Flexible Handling of Input Data

  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Fine-grained Control over Model Characteristics

  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Fine-grained control over the specificity/generality of rules is provided via hyperparameters.

Support for Post-Optimization and Pruning

  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.

⌚ Runtime and Memory Optimizations

In addition to the features mentioned above, several techniques that may speed up training or reduce the memory footprint are currently implemented.

Approximation Techniques

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification dataset to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.

Sparse Data Structures

  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some datasets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large datasets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large datasets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on datasets with many output variables.

Parallelization

  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

📚 Documentation

Our documentation provides an extensive user guide, as well as Python and C++ API references for developers. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

📜 License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mlrl_boomer-0.12.2-cp313-cp313-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.13Windows x86-64

mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.12.2-cp313-cp313-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

mlrl_boomer-0.12.2-cp312-cp312-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.12Windows x86-64

mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.12.2-cp312-cp312-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

mlrl_boomer-0.12.2-cp311-cp311-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.11Windows x86-64

mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.12.2-cp311-cp311-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.12.2-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 10c9ee358bd7395c2f0a53ffffffb781b49d9a49e95671867b24f38c7c3e5d84
MD5 0bccf35e8cc64c35e3f77361a8d7041a
BLAKE2b-256 ef3238981876557438932bee1c91cd84d177858f5d8e71d8fd75b3f0d07d7fea

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 86fa0a27c32a7f32fb5bd9e8d8060ab556a483734507dca6da85f76737781dbe
MD5 046a10f9b2c7a327bc4eefe6e3caf218
BLAKE2b-256 23d09d2a4ce5c129c8e607d87d85d9edb3fcb75ef100565e225f8ab53a550590

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 cb72b2aa21294074e9de2058b1086cced72f41da031606e379eef87c64f676d9
MD5 8ce93f782e3fda3acfbc01073945ff06
BLAKE2b-256 f983701de1d9b8a8b26a847994a1039d99ad64407d2df7fbb7022e4fa2543b81

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3b9412cfdef1e8c14700165f0a165669ee27d28083310e7c047cd056de510d7f
MD5 28402c05b3d281e8d65c02997f011031
BLAKE2b-256 92a68d949e5067608f40084010b95b117171ff09884745656847078e7b759475

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7a8a301f9b8712c7f6a82858ee621f7ce3f8e4c457fd61307f0f1c45b88fe258
MD5 8fa517cd2e4b31a2bf1947cdd0f8b84f
BLAKE2b-256 62096c7940749b6b40ca61a35c4a92d1526c6e5d8db3f77d9efd79e1261f5c06

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cdd27eb7f69d013567c931ed053eb7a8f5e4bc1f98bee0fe99e52002ea50aa54
MD5 6490c07b8784455b764f4cf67f5195e3
BLAKE2b-256 fb426887df9da53f8b277a5fcc546b8143807f5778535239e648b8c833356314

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6ac8618cd8716aba45314687f48452ce707d3005597c178f9a95f68aadc051a9
MD5 9ddfbaea45bdb66491e0ba738c4bc3c3
BLAKE2b-256 3ce7098bd3607fb2e0ab3a18422aac5b99efd9dac029aefc6a0e4ed7a16ca5c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3a6fb0697dc587db5cbc2325f3a45d93e0b70be206aed8c864d5c69ea3e0dcc7
MD5 94c8f72e6266307125eb6f876ae9372a
BLAKE2b-256 30c9fb16b282420c666d9ea2a76e9928d68ff5e973b45c79addcef8c25e9eba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 fb2ea1d7ef9df88d9cb64ad550855604eb6e61e2f6d053f86b972e7e588440a8
MD5 d3efc8ea540bcc302bf4ea1bd0f61549
BLAKE2b-256 678b4e0fa7e3bb7463f6f4189e5ffc2c58724f1419d694c9a2e5c2f40a18a2a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 ecafd3e84839c49b6802616dce0a89f04f2e8c987f5ebb040a54e5ec05ce6048
MD5 cb48dc36cf3a43dc8c882a103f0022ce
BLAKE2b-256 3fed051a88b63adf12d84ebe785f11b38d0e61143e64d02480c039f7c17718fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 b14bd6adb994b417d5d9eae1283cc98b436375b249c37a4eda65ce2daf62acee
MD5 a4f1dbcf916ec037c9a0223437be91d0
BLAKE2b-256 db50989a361355242694987406ffd7354e667f3c65bd96e650f12027a0ac8142

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e9f3a6308a2fa8aae04be5f3739eefa03bf33737f34eabc7e5ebf1ccede6b442
MD5 56ed07b0c2de6a907ec5ca111cfec60d
BLAKE2b-256 b2c9d6903f785633c33f9a75c73b0cf86bba02ebe578f762d65652b255995560

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.2-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page