Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label output rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status

:link: Important links: Documentation | Issue Tracker | Changelog | Contributors | Code of Conduct | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms. For example, this project does also provide a Separate-and-Conquer (SeCo) algorithm based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.

:book: References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section References of the documentation.

:wrench: Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules.

Deliberate Loss Optimization

  • Decomposable or non-decomposable loss functions can be optimized in expectation.
  • $L_1$ and $L_2$ regularization can be used.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.

Different Prediction Strategies

  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.

Flexible Handling of Input Data

  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Fine-grained Control over Model Characteristics

  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Fine-grained control over the specificity/generality of rules is provided via hyperparameters.

Support for Post-Optimization and Pruning

  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.

:watch: Runtime and Memory Optimizations

In addition to the features mentioned above, several techniques that may speed up training or reduce the memory footprint are currently implemented.

Approximation Techniques

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification data set to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.

Sparse Data Structures

  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some data sets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large data sets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large data sets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on data sets with many output variables.

Parallelization

  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

:books: Documentation

An extensive user guide, as well as an API documentation for developers, is available at https://mlrl-boomer.readthedocs.io. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

:scroll: License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

mlrl_boomer-0.11.3-cp313-cp313-win_amd64.whl (887.2 kB view details)

Uploaded CPython 3.13Windows x86-64

mlrl_boomer-0.11.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.3-cp313-cp313-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

mlrl_boomer-0.11.3-cp312-cp312-win_amd64.whl (897.2 kB view details)

Uploaded CPython 3.12Windows x86-64

mlrl_boomer-0.11.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.3-cp312-cp312-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

mlrl_boomer-0.11.3-cp311-cp311-win_amd64.whl (899.3 kB view details)

Uploaded CPython 3.11Windows x86-64

mlrl_boomer-0.11.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.3-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

mlrl_boomer-0.11.3-cp310-cp310-win_amd64.whl (905.3 kB view details)

Uploaded CPython 3.10Windows x86-64

mlrl_boomer-0.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.3-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.11.3-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 b7307e9169343ccf0d0ac6ae991ea50f9d08ddf232dcc2302b88deee2b932b21
MD5 aef643685536bea15073a29917ebe29d
BLAKE2b-256 ffd961bfcb58993efd5f978549500070403f409a25cd22f445b51320def96a3f

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9f206554d9efcd8ddc5df0a4bedfbb9a698ba00302bc0d1f99c893ddba6a14f6
MD5 23cc5826b9bc3265161aecbc14dd62f4
BLAKE2b-256 ee1f5562378e02d37ae0ebb57b224e8819298ab2e809b75c108649588d71f071

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 220b5c6f15732fc6211130ebfe3a9dd00279317235d89e6b782d860c8ddd526f
MD5 29b882f722b5eff7fb734916a8eb76b1
BLAKE2b-256 ad245999fc173fd4f4ffac509ca31031d0ec0a3b51a3792f48eee40069d20d8e

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1433e8253006961113d83750681aa862151fb11ddf4849b817c150cdf41fffaa
MD5 72ae3a682f4f893d14ba402844b5b74f
BLAKE2b-256 b4db2bb089014a70de12bfd95b2029d8e72a60216b74f1fff8c34078e414f0dd

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 ffc4ee3cd3d16390a3be889cc42099db4a5de3b25ad5650e277db52a3b40714b
MD5 2699bbf4d151976493114c2fe0495621
BLAKE2b-256 81296382ec0de96f49a79d0aaea4a285302dc23c7821df082780008a160ce6f7

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 72c7e548390163b105fca270f0a9a760a6c856b71d2a90ea56485b16e52fce44
MD5 2d81ff79c33e28d2c316e3b74ae66d19
BLAKE2b-256 2db861fc6eb7036a840ce0a7964872d4ff75e8e4e415742a649c24dc874b5f88

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4633e5aed53d48a92f528bded7a3235dee246afb975031eee3295d709c11653e
MD5 69bad0be50ddf6830653bc0eecae5f70
BLAKE2b-256 6aa226f7cded182fac940b27a8cb0a9eab18d5fd337b61df91ccf25b53b6beef

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 82c8380067778ce74f4b5178530f9b04d4722e7204c9f4abf70dea70d890a8b1
MD5 70cec187f394cb9447dad9b39d36f4c1
BLAKE2b-256 43795b2d795bf40fbac602f01e564464f03fa1ae78c61ccc88bb3f60f443d738

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 40e5e4c255de3a0ae70bc1dfc0bf0efb80c8465309999c1ce0d0e179c3a45909
MD5 bce2dcd94cc54c6c350910528c6ceb29
BLAKE2b-256 b9041a6a5a7c4a5d0f5ec0c88cdaf49fb586d9fe5108ac536a5042a46d56ddc8

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 60e0a2ebec4919458bb4dc3f15f64616fe356e435716b29794275cc36c8895e7
MD5 64eede8a4a9335437da1b9a569901bdb
BLAKE2b-256 d129135d59bcec67ada1634a3b88de7e6d5cb21774564339a6ecd14b03f7059f

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e15a49a786c4606817c96873dc1d068ddb2e631196ff1416ac4482fbf6a156e0
MD5 e043039d23fe8e4fe0587e54175a909d
BLAKE2b-256 f95d0bf7547884f3045846dd6610a9ad9a619b141b527f16613f95c13e9756e5

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2fd8a2395c47ac2f23ca319403de8f5d857a1200c83a9a687780c1bdebc4fb6b
MD5 f4b2b8e244121cf0fe7439d7c48d7cfc
BLAKE2b-256 8a0aad2af58dacc020f0957362c04eacbe12194b36d234e54038a0452092f6df

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1f3e4e4a968d3067a6c5ea9153ea3d700e4ab756dbc85d7313b411972b365ee9
MD5 aab46967856b4745a57627dd314cbbb7
BLAKE2b-256 0b0c980f35498de9f1a9abbfd1cd23d75812e663a62cbc75d3a1a61398be723d

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ebb9efefe800414bf1f637ee2396f600fdc288b0fa9872b197f0cd325915740e
MD5 7b4e1cf69df2be28b7b67112c920818b
BLAKE2b-256 87a59754db8f7967e23a1f8308577d79c80e1d1ce668da85b132d26e43ba953c

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 23f9a2b60e80fafc73934527a088e530d48a8706bf99bb2019911e6fb10fc130
MD5 688c8b9dba70913c3034cb46064c36d9
BLAKE2b-256 04a8ea3a9445f272069b25f8d632868b3324bdd8f89096dea63e3ed8b59afdb1

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.3-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.3-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9b8905432df281e40d1a44666758deb73b965de4e133ead800a0704e43fd3056
MD5 832cbe085a631bfc16b384034b5bd2c1
BLAKE2b-256 178d9a988b25ede9053370458a819e1fadf31563f41600ea0637fb0fbb368788

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page