Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label output rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status

🔗 Important links: Documentation | Issue Tracker | Changelog | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

The BOOMER Algorithm

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms (see packages mlrl-common and mlrl-seco).

📖 References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper.

🔧 Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules.

Deliberate Loss Optimization

  • Decomposable or non-decomposable loss functions can be optimized in expectation.
  • L1 and L2 regularization can be used.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.

Different Prediction Strategies

  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.

Flexible Handling of Input Data

  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Fine-grained Control over Model Characteristics

  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Fine-grained control over the specificity/generality of rules is provided via hyperparameters.

Support for Post-Optimization and Pruning

  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.

⌚ Runtime and Memory Optimizations

In addition to the features mentioned above, several techniques that may speed up training or reduce the memory footprint are currently implemented.

Approximation Techniques

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification dataset to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.

Sparse Data Structures

  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some datasets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large datasets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large datasets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on datasets with many output variables.

Parallelization

  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

📚 Documentation

Our documentation provides an extensive user guide, as well as Python and C++ API references for developers. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

📜 License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mlrl_boomer-0.14.2-cp313-cp313-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.13Windows x86-64

mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.14.2-cp313-cp313-macosx_11_0_arm64.whl (2.9 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

mlrl_boomer-0.14.2-cp312-cp312-win_amd64.whl (1.2 MB view details)

Uploaded CPython 3.12Windows x86-64

mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (7.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl (7.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ ARM64manylinux: glibc 2.28+ ARM64

mlrl_boomer-0.14.2-cp312-cp312-macosx_11_0_arm64.whl (3.0 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.14.2-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 e6c341681806e6b70a12a016b18b6a65736d193aefa07c3c151fbd04f5acf2e8
MD5 23ef1e28847cdb8841db4bd49652c873
BLAKE2b-256 cadb94c54a44f0043d98472c16672d6c40afb4e7a785c4e25d05d0483edcc72d

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 92111ef35e5213e220ed25198b7f80ecee30f52426aeb35150d25a9036b6e08b
MD5 28bb27819dd34aea6cf9f618de3253f1
BLAKE2b-256 94144b28665e2145e47fba4974b2b438f3e9df075e0e504c42b59a6033f8f9f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 f385ba0a9bee2bbf423db2d00b9e5c082512e92709d2c23aac2e73aeaee5e3d8
MD5 3a5c82f5e23f505a6d8488ba0d822be7
BLAKE2b-256 ecdee264bdcc6971f19a9e8cf9a9cb76ac491ed9e63207f5aed346b994996b5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp313-cp313-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 40209c0e73b6095554851ea560d49e480c54c2a36e4cb672398fd74b630b7537
MD5 8b55ffeac7efb2af3270ab850b48f708
BLAKE2b-256 6f6048f9468cb3939e17317f0d11db3e741404150f883254114a8df783dcc6c0

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 7e8e45e4aca869cb9b4db336aa63149edc20909a0431850a94b6f8e1da9ed6d9
MD5 061550fd9e8d48edc8b70ca2317a32ac
BLAKE2b-256 52de4681b44fd52ea8b07ce7362fb080549c356240c07ec7f340d8218074bfad

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 69f7848e6ce2d752165c3f514a98e2d64e4c2fc069667df961b808b0715a5a1c
MD5 9823f9a32815633f02f200e557f1684e
BLAKE2b-256 f34a850a9e3720b0a00bc4c763becdcd531d2b1287dbc51ca2f62906a374643c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 774ddc4f5b8d1b1786cc186ee2c7d0ffa208e2cac06ae68bcf4f4dfc45c442c0
MD5 45eb587d3face1341ed61053046f0238
BLAKE2b-256 5027badbe65eb5b864afcc871c1581de2d0f08ac2ba6d24d569626e9514965f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp312-cp312-manylinux_2_27_aarch64.manylinux_2_28_aarch64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlrl_boomer-0.14.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.14.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1316dcf79a9e281257eeea78bdf10e765b2a7e436bdfdf7dd93f40da4a722b6f
MD5 6d1b604c9e3ac42b16f53b6c32ead453
BLAKE2b-256 2f8fe260f72a62088b60c619e840e50b3402a07a0fa2d9e9718f2a1db8403c32

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlrl_boomer-0.14.2-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on mrapp-ke/MLRL-Boomer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page