Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status X URL

Important links: Documentation | Issue Tracker | Changelog | Contributors | Code of Conduct | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms. For example, this project does also provide a Separate-and-Conquer (SeCo) algorithm based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.

References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section References of the documentation.

Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules:

  • Decomposable or non-decomposable loss functions can be minimized in expectation.
  • L1 and L2 regularization can be used.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.
  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.
  • Fine-grained control over the specificity/generality of rules is provided via hyper-parameters.
  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.
  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Runtime and Memory Optimizations

In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification data set to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some data sets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large data sets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large data sets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on data sets with many output variables.
  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

Documentation

An extensive user guide, as well as an API documentation for developers, is available at https://mlrl-boomer.readthedocs.io. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mlrl_boomer-0.11.1-cp312-cp312-win_amd64.whl (898.4 kB view details)

Uploaded CPython 3.12 Windows x86-64

mlrl_boomer-0.11.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.1-cp312-cp312-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

mlrl_boomer-0.11.1-cp311-cp311-win_amd64.whl (900.4 kB view details)

Uploaded CPython 3.11 Windows x86-64

mlrl_boomer-0.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.1-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

mlrl_boomer-0.11.1-cp310-cp310-win_amd64.whl (907.1 kB view details)

Uploaded CPython 3.10 Windows x86-64

mlrl_boomer-0.11.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.1-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.11.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 f56e83a6f54c66d1e5797fedbfcdddf953f9b7522540dee9909fb3a58bf6b722
MD5 3652201aab52900180af94f99cb60fbc
BLAKE2b-256 342191b2c45f58a1512621a90884e4c253a119a4aa1b4ef00f65759a65bab153

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 78f3ff77e4e88b37143064f78bc2de5facad6075d5188d1e2561d147400e0857
MD5 932f3783692a01fb0b40a887c6cffa7d
BLAKE2b-256 0b535137c9658421b4be5ec0392b8988144dc0b66d1878656a6b2af4e5072ea7

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ab55ef8f43670382fe16459babc3c5ffb957172a1af18df735a0623a637fc1cf
MD5 6bd8b7c789eea9e11e374e91d0695c73
BLAKE2b-256 028ea4af900bcb2158932c3acc57a14ba9a56be7182ee0d98c8ac20149a9284a

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 48593b3464c688d1b2ec3840ff20dbe4260015af4584317535f98d38cd931dc2
MD5 9a85d2382d76b0729818968546fd15dd
BLAKE2b-256 e52404ee97226df80d25a7cf43113c29a751b39f02b26152debd6eb14958dcd5

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 50bb99957f20f21a31baccd6d42cde89947db00341d4a783a4dad7593bbb57fa
MD5 855059c5dee818f5a903da8e215239c7
BLAKE2b-256 15f62a70ea813ab7e4eec3131a9483ca2cd1348558288f9a91eab4f9dd16bcb9

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 efb21520e8700bc62009b24fff779ead03f6c84f64672311bce1aaf7859cacab
MD5 f4b830ca6b045631f5412b4ec02ed5c9
BLAKE2b-256 df7c4d4e48474914aff4343335f5a725620b7c32e74245f34a783f5ec79b555c

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 ebb411e1b9cc1524dd3f8b2e2638e4e9f1baff8605b15d893023968d7f44de31
MD5 9c6403bf21e35b73ecc5976dbe42246e
BLAKE2b-256 32f5093921489614d5072b208d8d08bf699c427a99216487ddc66b050872bcfd

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1ac1016546e68de122168b7e71d45dc01e7a38321dcfd191634ebcb80a923fbc
MD5 383afdeeb4b4af52ec74756596994578
BLAKE2b-256 e4ad7be7b128b64d13ff616a6284e118e1f99fbb1ea7c3349a1757eae6e198c1

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 6ee56546c81f8f64b1f7f77f74234e4fd07e872ac932bad4134fc5db32d3b216
MD5 401be4b76869da008b32960dcd196eb2
BLAKE2b-256 b5e4723e81d7ebeffe9b01efa3deb92b1ea2a86dbc99b4baa33997cc8da88583

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 92461c6093b309683b64e935d5c53d5995f855cdd70cc0d868981bd63745ac95
MD5 14c6467f5a373f91112ad013be3e6975
BLAKE2b-256 5fcda4861fb7073659e98f83b63699dabb569e976b6e51c18eb3c2eb6d49a08e

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 bcae83a08270d9fd919baca843a7a69accd166a255b13df7f8ea9a8e5a021a7b
MD5 b38e2acf44b56ca02b10b0ec19a4f2fe
BLAKE2b-256 13dcd443b09bc8b6f6b3bd4ad6f0fe09262f83d25179b294db5ba3c2c20bfed3

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d6ea336db20f704cb74470dcc9b3e7b9090e61770bdf7959029be4c641497439
MD5 a80e36eea481e8532c06ceb7738d3819
BLAKE2b-256 453bd18497f3942c4eccad898ee978736c2c5dc61fb1b09240e662d05039055a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page