Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status X URL

Important links: Documentation | Issue Tracker | Changelog | Contributors | Code of Conduct | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules that integrates with the popular scikit-learn machine learning framework.

The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses gradient boosting to learn an ensemble of rules that is built with respect to a given multivariate loss function.

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms. For example, this project does also provide a Separate-and-Conquer (SeCo) algorithm based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.

References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section References of the documentation.

Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification rules:

  • Label-wise decomposable or non-decomposable loss functions can be minimized in expectation.
  • L1 and L2 regularization can be used.
  • Single-label, partial, or complete heads can be used by rules, i.e., they can predict for an individual label, a subset of the available labels, or all labels. Predicting for multiple labels simultaneously enables rules to model local dependencies between labels.
  • Various strategies for predicting regression scores, labels or probabilities are available.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a model.
  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Sampling techniques and stratification methods can be used to learn new rules on a subset of the available training examples, features, or labels.
  • Shrinkage (a.k.a. the learning rate) can be adjusted to control the impact of individual rules on the overall ensemble.
  • Fine-grained control over the specificity/generality of rules is provided via hyper-parameters.
  • Incremental reduced error pruning can be used to remove overly specific conditions from rules and prevent overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help to improve the predictive performance of a model by reconstructing each rule in the context of the other rules.
  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Runtime and Memory Optimizations

In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Gradient-based label binning (GBLB) can be used to assign the available labels to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some data sets.
  • Sparse label matrices can be used for training. This may reduce the memory footprint in case of large data sets.
  • Sparse prediction matrices can be used to store predicted labels. This may reduce the memory footprint in case of large data sets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on data sets with many labels.
  • Multi-threading can be used to parallelize the evaluation of a rule's potential refinements across several features, to update the gradients and Hessians of individual examples in parallel, or to obtain predictions for several examples in parallel.

Documentation

An extensive user guide, as well as an API documentation for developers, is available at https://mlrl-boomer.readthedocs.io. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mlrl_boomer-0.10.2-cp312-cp312-win_amd64.whl (754.2 kB view details)

Uploaded CPython 3.12 Windows x86-64

mlrl_boomer-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.2-cp312-cp312-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

mlrl_boomer-0.10.2-cp311-cp311-win_amd64.whl (753.9 kB view details)

Uploaded CPython 3.11 Windows x86-64

mlrl_boomer-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.2-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

mlrl_boomer-0.10.2-cp310-cp310-win_amd64.whl (756.7 kB view details)

Uploaded CPython 3.10 Windows x86-64

mlrl_boomer-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.2-cp310-cp310-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

mlrl_boomer-0.10.2-cp39-cp39-win_amd64.whl (762.6 kB view details)

Uploaded CPython 3.9 Windows x86-64

mlrl_boomer-0.10.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.10.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.8 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.10.2-cp39-cp39-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.10.2-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 47880ef452567fa75de401d09bd15e7fb0cbf6ec0b55090857ab05c99a8ef147
MD5 72ed548d876d47642f60894213b1f71a
BLAKE2b-256 d298d8068b479ceb8976df6e63bd08684b358bffa358315191b71a7dce46e16b

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 32871767f7122763d3584dfe1407e3793f7eda511ec32a73bfe68ca7634706f7
MD5 49542e05a4291e17c79fe9ab276420f7
BLAKE2b-256 a3c2ec23076a75d8d2964faf7d2553635a4fedd840140e8fbd7efb64a5fb26dc

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0b2e4816046f0724d9dbc717b057609bdf886f9db44f1c15807a64d7a7e314b4
MD5 9ee65d95be964eff4e4610b0e3516862
BLAKE2b-256 316f79f8419ad721cdfc2972ec2b95fa1e30e93fd9e462e41eae4d77c1c535cc

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 941d577ea22ad55bde8a81f4cac53fd5172e7b8e4b6d9125b12e0a4cad0f891b
MD5 51b34e32f0a0ac8c5bd362ef56d934ef
BLAKE2b-256 83bb88c1d584228a3b776af26bb402113682db3118fce87d4263da9229629ce7

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0e8e1d1ad722c2d78530be9bcd0c868c61ad5d4e317c0cb2c2df39e37a49f936
MD5 4a69efe15c23654a98ceee169a727b73
BLAKE2b-256 90942f9003a22e573d6a997d907f337fc2b58a3227148d1d4a19b27ff9c9f613

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6ded8fd750e3893d55e5c32563f7a44a840ab37fa7f34c94ba69e25c82b85a06
MD5 bcc2e0d5a220b75a6d34f7a18c1fb890
BLAKE2b-256 eb26a8e33f0e8cce1f818fb6836673b99bb6f939dbfca42f0cfbfe533b544f79

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 a33356e27d5a243431492d95893cc394ff0acfc76e29dcbb65cb7e00a9e37327
MD5 4bbccd9e05f2db5df4faf9a750b7bc01
BLAKE2b-256 1f10fc0ea5c5f7957b5dd408e97b7ee7a1807faf9db6dc96ace24f77b815d8d1

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5e3eb556504992cce6ce8e15354dd11af73cc095824e6a0b3f619e331d78dc2e
MD5 8d2b490b029ee8acd50b84a116dd2438
BLAKE2b-256 ff3070db11c967aea94b54eda99d0531d175d0f7d54c17a9ec1be122341a498b

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f5cf508ef65fe328a78db3f8404985aa33553df985cd422e205259ee2b92cfd6
MD5 28a0bc59907147da3fcabbc2a0455ee3
BLAKE2b-256 54b705f7861b9fe53b85f14e6d57b259ca8ba2993613465c889585caf55f1156

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 afab32b2bcc2e6ca991df5c38f094d05e8eb25d92291326ed1fe8dad918526d2
MD5 746735f81b7bbf7ff438a3590c5f0a1f
BLAKE2b-256 369d8b0233b8a0c1eade35e89506c08753d12a8f56da34bb2b1c0adb0a4a4141

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 c1a922618f69587bafca2179fa78bbac5480f55dbdf01b53c9ea5caa6277aa98
MD5 1a2fdff525684e1e18051361788ce7af
BLAKE2b-256 524416eb66a63a34901995f7e8e631b1f4741e878b820f65b8d4977171cfec31

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e9fc471ada2663e8ecac1be27ed53a5271c06ec4b486b3d154e7083cb7ae62e7
MD5 164e58a4d26153322c104b554a279895
BLAKE2b-256 cb9e480c4666294b08c9f22cc500770def85605cc9c8eb57cf141ec7a0abc4ad

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 c29bf5b6e7a6b79607268b103c8466a9a97a4089f5a13d6fc84f063796d71e00
MD5 d7b709c092ac51e6d784105666c4f3b0
BLAKE2b-256 b543f0f6c81c1295d8d4ed37019bbe1a17d29c2ae6fe6e9a19105649d7cff74f

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f8fd0a6e2957e0b640dcbe885c9c8734557ff8431ec23d223048e3ed1088166c
MD5 464e6cab26f1cf6ab713ac155b1d2189
BLAKE2b-256 7e55c5cc47324c388a2c7206882681c687d518fba56a56ebd932ae4bb9797163

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d942f78a167fdf35fa719fab71dfc4ce54815ec2db85986217c1cb79c4556e6f
MD5 95cdc9c4c5644d0178a4ab05d0e1993e
BLAKE2b-256 4cae3bb3b5844a86e3f65312b40e10048d58404dfde0163bae214c187a15e829

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.10.2-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.10.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ff615d89a195540045a3c33db73dcf98b93bc3ad57461350b2fe308f740db3d6
MD5 788032f969eed378353174440ddf8648
BLAKE2b-256 b30e6b32c4fdd66ae6c0b25589a2aa5e81f0683fa6b804cab8439b4e62f47c84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page