Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status X URL

Important links: Documentation | Issue Tracker | Changelog | Contributors | Code of Conduct | License

This software package provides the official implementation of BOOMER - an algorithm for learning gradient boosted multi-output rules that uses gradient boosting for learning an ensemble of rules that is built with respect to a specific multivariate loss function. It integrates with the popular scikit-learn machine learning framework.

The problem domains addressed by this algorithm include the following:

  • Multi-label classification: The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
  • Multi-output regression: Multivariate regression problems require to predict for more than a single numerical output variable.

To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. Moreover, to ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements. This modular approach enables implementing different kind of rule learning algorithms. For example, this project does also provide a Separate-and-Conquer (SeCo) algorithm based on traditional rule learning techniques that are particularly well-suited for learning interpretable models.

References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section References of the documentation.

Functionalities

The algorithm that is provided by this project currently supports the following core functionalities for learning ensembles of boosted classification or regression rules:

  • Decomposable or non-decomposable loss functions can be minimized in expectation.
  • L1 and L2 regularization can be used.
  • Single-output, partial, or complete heads can be used by rules, i.e., they can predict for a single output, a subset of the available outputs, or all of them. Predicting for multiple outputs simultaneously enables to model local dependencies between them.
  • Various strategies for predicting scores, binary labels or probabilities are available, depending on whether a classification or regression model is used.
  • Isotonic regression models can be used to calibrate marginal and joint probabilities predicted by a classification model.
  • Rules can be constructed via a greedy search or a beam search. The latter may help to improve the quality of individual rules.
  • Sampling techniques and stratification methods can be used for learning new rules on a subset of the available training examples, features, or output variables.
  • Shrinkage (a.k.a. the learning rate) can be adjusted for controlling the impact of individual rules on the overall ensemble.
  • Fine-grained control over the specificity/generality of rules is provided via hyper-parameters.
  • Incremental reduced error pruning can be used for removing overly specific conditions from rules and preventing overfitting.
  • Post- and pre-pruning (a.k.a. early stopping) allows to determine the optimal number of rules to be included in an ensemble.
  • Sequential post-optimization may help improving the predictive performance of a model by reconstructing each rule in the context of the other rules.
  • Native support for numerical, ordinal, and nominal features eliminates the need for pre-processing techniques such as one-hot encoding.
  • Handling of missing feature values, i.e., occurrences of NaN in the feature matrix, is implemented by the algorithm.

Runtime and Memory Optimizations

In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:

  • Unsupervised feature binning can be used to speed up the evaluation of a rule's potential conditions when dealing with numerical features.
  • Gradient-based label binning (GBLB) can be used for assigning the labels included in a multi-label classification data set to a limited number of bins. This may speed up training significantly when minimizing a non-decomposable loss function using rules with partial or complete heads.
  • Sparse feature matrices can be used for training and prediction. This may speed up training significantly on some data sets.
  • Sparse ground truth matrices can be used for training. This may reduce the memory footprint in case of large data sets.
  • Sparse prediction matrices can be used for storing predicted labels. This may reduce the memory footprint in case of large data sets.
  • Sparse matrices for storing gradients and Hessians can be used if supported by the loss function. This may speed up training significantly on data sets with many output variables.
  • Multi-threading can be used for parallelizing the evaluation of a rule's potential refinements across several features, updating the gradients and Hessians of individual examples in parallel, or obtaining predictions for several examples in parallel.

Documentation

An extensive user guide, as well as an API documentation for developers, is available at https://mlrl-boomer.readthedocs.io. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mlrl_boomer-0.11.0-cp312-cp312-win_amd64.whl (892.9 kB view details)

Uploaded CPython 3.12 Windows x86-64

mlrl_boomer-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.0-cp312-cp312-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.12 macOS 11.0+ ARM64

mlrl_boomer-0.11.0-cp311-cp311-win_amd64.whl (895.5 kB view details)

Uploaded CPython 3.11 Windows x86-64

mlrl_boomer-0.11.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.0-cp311-cp311-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

mlrl_boomer-0.11.0-cp310-cp310-win_amd64.whl (902.1 kB view details)

Uploaded CPython 3.10 Windows x86-64

mlrl_boomer-0.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.11.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (3.3 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

mlrl_boomer-0.11.0-cp310-cp310-macosx_11_0_arm64.whl (1.7 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

File details

Details for the file mlrl_boomer-0.11.0-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 593c9c47734a563d35b2b304cb8bc5d8ad5df9a1d9383bfb42006147fbd94330
MD5 1d2be17b78fb648392aacc9444c435cc
BLAKE2b-256 f673694750616681780f150da04cf02c760d61092ff1cb1669e5180ba0b60976

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9eb3ee1a0211f5754780ebeec489d87a9efe6985efb7e05d0865c2ee0dc644dd
MD5 e5884ec0fb343ed3dc5f9285a2418485
BLAKE2b-256 096b06f4d40537eb8bf95318024fd7e158457df9893137cb2bd92ab1e803fad2

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 5e363f604c2d65d0fb2969d3be0b208bb75f272ef8b104fb7a0f238402b7bb69
MD5 5950a420e277e4429527eef39fcb6d0a
BLAKE2b-256 4d348ba9872cf5d42f5604155d4ad86981804c09a4513fd34342e7b232a99d04

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a0bbc00f33ccfa4e2b035c8997c047701fb8cb0b8911b91fd7511303bd8e63b2
MD5 ed89bcaa4b0d566e338847cf4d3b25e1
BLAKE2b-256 7deb4544aacfba650deab93775aa53d896a61b74cfd538913fc243b886695195

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 8c451ada1ca7c97bc6d33775aabc8afc0b997238679c84e3f1d0c0ce2aef3609
MD5 7569dbb74d5daa472f2554ef55215801
BLAKE2b-256 4f4dfac2f0e99aa53e114947b7045ab4ff3c57bee12dcf4439f4f099bdfd3782

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dc4a5d9bec83e1f483bd79834215551af8d059d6d485446b114e8d385bff92d6
MD5 4f5d5334731d7bfef5ac8f374b05cca1
BLAKE2b-256 d6eb20176f343c9a15af5763ebf224df6a824d0d6b3b67e1188b89ac8a0c55c8

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 dc88436a4cb623443dfacdb1d2f51eab8c32539c8fa8b16d781af82a95173eae
MD5 9c616d287df60540400bbc2c22e4e355
BLAKE2b-256 46a444e4e50db2ab4596f5ede7217143f50a2e9c7dfa9be7f4109096b213d957

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7314dc67b99851396969bd6126244cfe2cfd04628ed642370465dccd8655cc5e
MD5 d65b287fe55b6e6d9dd5c2f0da06b54c
BLAKE2b-256 504f1eb73000ff0150aad42dfce0956227b19a71cb3fcec3f4228be4f5942594

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8ec25eda86da67ebfa171d72638d77aec755c41c71af05e6a9b47b9dfd8408ba
MD5 e9d9226793dd0b621a4cb4d83509591f
BLAKE2b-256 1abe432db6de5bd2eaa4d6ac9266545c07d97dfa88833d4e1acbed21ef92c9e5

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 48ff802df3b6ecc0c35e8f924a7b686720561b9217e8315adc0c7ccfa942340d
MD5 9d753f3f18016939104b76016b588cbf
BLAKE2b-256 9af2946af0d06571326a878c5c495b0adbff412ceabc6a384ca248de17ad73e9

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 e9a7e3e11440a665d2e7288d9ec33d444ba110734dba6a9802f6a0c9dd779e9a
MD5 efbfb3c0a55d7de0a94108fde7fe15a0
BLAKE2b-256 14aa180825d7b5d34aed90f9f0cfb73c6591d13e8b98ffe1e455bc39c09c6aec

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.11.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.11.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 558e0ba928a1c11276410f7fdc31b01e2c462e3ea84c9ceb8d55361adef0a928
MD5 ac38e2c2b0d312cd528ae2b1a3aaae30
BLAKE2b-256 1088d38ccf11073e61023c637672b0af753836b178bf65502f78f4aab4bfc3a4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page