Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label output rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status

🔗 Important links: Documentation | Issue Tracker | Changelog | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

The BOOMER Algorithm

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms (see packages mlrl-common and mlrl-seco).

📖 References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper.

🔧 Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules.

Deliberate Loss Optimization

  • Decomposable or non-decomposable loss functions can be optimized in expectation.
  • L1 and L2 regularization can be used.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.

Different Prediction Strategies

  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.

Flexible Handling of Input Data

  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Fine-grained Control over Model Characteristics

  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Fine-grained control over the specificity/generality of rules is provided via hyperparameters.

Support for Post-Optimization and Pruning

  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.

⌚ Runtime and Memory Optimizations

In addition to the features mentioned above, several techniques that may speed up training or reduce the memory footprint are currently implemented.

Approximation Techniques

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification dataset to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.

Sparse Data Structures

  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some datasets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large datasets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large datasets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on datasets with many output variables.

Parallelization

  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

📚 Documentation

Our documentation provides an extensive user guide, as well as Python and C++ API references for developers. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

📜 License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mlrl_boomer-0.12.1-cp313-cp313-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.13Windows x86-64

mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.12.1-cp313-cp313-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

mlrl_boomer-0.12.1-cp312-cp312-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.12Windows x86-64

mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.12.1-cp312-cp312-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

mlrl_boomer-0.12.1-cp311-cp311-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.11Windows x86-64

mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.12.1-cp311-cp311-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.12.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 68ce3f70410acc3c68407514a074bf62195a26559e4aa14b2dc7a9fa954da3f0
MD5 e629a31ec6c7d5c23c3f7a77645dea04
BLAKE2b-256 3a0352b6ad19cf7a8723e652bf808eaf26177ab2982c59c909e02d9ae1319265

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 02b76323bb97d6e19cb0ad8d64372e50c7f2417379a96cf28ea0d104d78f91b5
MD5 cb46814c27696b329c87acc01078e20e
BLAKE2b-256 35bd31b1ee9cdb921c24689938d5a0362f7a868419139d8af2cc1d2b81312a82

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 3291da136f58fee27d26a0c012047e3cf813b7b641416cc0a4904def12cdc1f8
MD5 df82c9bf11b1f337a8b7efc6f3542f12
BLAKE2b-256 aac4d9852572268a3071c684f7d8d4f2a6c239f6acd91bafd7e9eb4d4da2ac68

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7ba85101989450bcd2189709ed981bae5b0d0174593fe4af3f767179e61db93f
MD5 5bb2735a9143cdb9f60c8c17e959ce19
BLAKE2b-256 b5afe6bbc37df082c6ff3b7581d0067af0f282ebf896355748efa7eab6fac3de

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 280ea868fccdc7b57c7b718e2ff3a6a4a26139106a97ac7251bcd6e982cc9908
MD5 b75d2c3afbabdc39976d1bedf05ead80
BLAKE2b-256 08b4a4cd0ad25927d235e29adc476dcaf12fe17ba7a45583237f5561cdf43e5f

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2c35d8b4d1adbcb13517e2234faa419431eb6a7b47539025f82214042948c4f1
MD5 295fb5e97a622907f9ef57b3ed27b75a
BLAKE2b-256 5d16ad8dccebbff69aa76c4bb5439da34cd4fd68b5f905e559aebd3a2aadefb9

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 a34fe08386d06f0fc82af03dffd3f478d7e9d24db3e45342d6f4dca80b78456a
MD5 54dc68715e9098267d30632a5d699659
BLAKE2b-256 35e5d24eee5c5738b309223f6585076457963b98a3ed2e347776d7514eb5790f

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a0fab700d8cdd00d64a6935046c7db6dec2cd1d3e865b0b8f85cccd7c1a5be32
MD5 7e4aa241b37b8ec74395b6c4933e9561
BLAKE2b-256 20183588841ec84c7af9aa3b71398fb336a4dfd6850faf89c30b09a13aa77dff

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 28961d889591f67dee2c0aff55176f599164ce32c8985822c1b48ed0f1380e6f
MD5 623dde9d22c4bb304c2adfcb11ece7a5
BLAKE2b-256 3d2cb5d548f54bbcccb348c2cd6ef91c84f726c851afc1e3ba7589e1c39f5d98

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 97ccd77a5b1377cfdcf6442acbefbaa8f9d0528dae16f60d865e8f4cf61d955c
MD5 fbc5414206605b38c5f9d876e1210f55
BLAKE2b-256 0fc2b15f29016dc3a37f9ae9b34f9d9490d9972fad29c65a3d2d256c175b8632

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 6388827235eac49a776c78149df198640c5f36a8df5671ea0a20dd8d5568ac78
MD5 cf9393fa7111353ad523da1a22def0e1
BLAKE2b-256 d3d73bbc4f223674df7cfc728463a4e59903eb0bc6326afb26e0f0430bf297dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp311-cp311-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.12.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.12.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3f1133fd9ae52797bc0ced4255b9f460a14a49d958a4b85a9dfc7d261efa276a
MD5 65e06a6ce9c362391d6f130c0c2f5bc1
BLAKE2b-256 eb77cf2948c2b60d3771c891caf57c2ba49c7230e2df58dfcdaec882e76cc519

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.12.1-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page