Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label output rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status

🔗 Important links: Documentation | Issue Tracker | Changelog | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

The BOOMER Algorithm

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms (see packages mlrl-common and mlrl-seco).

📖 References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper.

🔧 Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules.

Deliberate Loss Optimization

  • Decomposable or non-decomposable loss functions can be optimized in expectation.
  • L1 and L2 regularization can be used.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.

Different Prediction Strategies

  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.

Flexible Handling of Input Data

  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Fine-grained Control over Model Characteristics

  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Fine-grained control over the specificity/generality of rules is provided via hyperparameters.

Support for Post-Optimization and Pruning

  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.

⌚ Runtime and Memory Optimizations

In addition to the features mentioned above, several techniques that may speed up training or reduce the memory footprint are currently implemented.

Approximation Techniques

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification dataset to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.

Sparse Data Structures

  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some datasets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large datasets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large datasets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on datasets with many output variables.

Parallelization

  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

📚 Documentation

Our documentation provides an extensive user guide, as well as Python and C++ API references for developers. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

📜 License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mlrl_boomer-0.13.1-cp313-cp313-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.13Windows x86-64

mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.13.1-cp313-cp313-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

mlrl_boomer-0.13.1-cp312-cp312-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.12Windows x86-64

mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.13.1-cp312-cp312-macosx_11_0_arm64.whl (3.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.13.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 36958af1e061036384f0055054ba4c9b0f6dbaa399717615d27e443b0570d00d
MD5 5bc9d43c2a08decce26520491de7c8f5
BLAKE2b-256 45017b6c2f134df0c1c5cd7c7211face740d5d8f7900e84de3dcdb0245a5b5ab

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 3df35c3cc9238368d7917dac5845b03198b5e00622835c0417dc9ff60fa2d83a
MD5 b30674f6550be0cc7246a9086ef7fab0
BLAKE2b-256 b1b6a287c961e28e477ad63bba9ba6bf4cdf60f780731f5c585d6e5275b2f889

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 57a5789d373b9def9b3eb2d6f9a41e26a442276148aa07266fab5e76ca0caf77
MD5 bde5851d14dbd4835953944d521e64c9
BLAKE2b-256 eb2c45211aa358e0a09647abd594eb84ee7f46f93249a4ed435e21ddacb66689

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.13.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ea30ac888c5bc072acac2b8c5745ba92dc141b8fb2a6b4c07161a4a4bb7fda57
MD5 dc153c21b2689ddcf63b54a27ea2dd18
BLAKE2b-256 89e573a52ba5888034e2a4afcbd59b4360413e233d396fc8c9f4ddf2a24477d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.13.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6a691a03a53bd72b69f9cd02a6a308e7e0167c894214f05e6f7ad13ee264e801
MD5 e1d5cd0508d12a3163ab63e317b3475b
BLAKE2b-256 d834f07a1906922a189958b32d7ee6f8f84cf950d325238a2abada17a2c28375

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 93a41cadc2e7f3461fb21dac4c3c575528aeafd883d7c2dbc6c80cd9940452b0
MD5 debe4b3712efe28269f695e61c41537d
BLAKE2b-256 5c40ab6ed63e8095da926cd00335bb0eb10c56e5f217b1f0eb98cd868a28eb80

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 2619274d2ccb9cfb928a632c2552fd5f732c0b805ff7c6815b82d0cbf38bcbe3
MD5 adb7bb563f3584861db5a596328be79c
BLAKE2b-256 eba4a2035c7fde7d7fcda201d65a75ed9e86941e532d9d7a286cb8c84c7f02bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.13.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.13.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3f956bfc9ae54ad689e48f00d0c0f5a86e1213b1a5e816b3c51657283f1be082
MD5 8610f8342b0cad46f76132dd66ab1a99
BLAKE2b-256 d2eb7c31900c4d5a5d80078558acb882a7d977465e4e1f121bd389b4a1aa6cbc

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.13.1-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page