Skip to main content

A scikit-learn implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules

Project description

BOOMER - Gradient Boosted Multi-Label Classification Rules

License: MIT PyPI version Documentation Status

This software package provides an implementation of BOOMER - an algorithm for learning gradient boosted multi-label classification rules that integrates with the popular scikit-learn machine learning framework.

The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics. The BOOMER algorithm uses gradient boosting to learn an ensemble of rules that is built with respect to a given multivariate loss function. To provide a versatile tool for different use cases, great emphasis is put on the efficiency of the implementation. To ensure its flexibility, it is designed in a modular fashion and can therefore easily be adjusted to different requirements.

References

The algorithm was first published in the following paper. A preprint version is publicly available here.

Michael Rapp, Eneldo Loza Mencía, Johannes Fürnkranz Vu-Linh Nguyen and Eyke Hüllermeier. Learning Gradient Boosted Multi-label Classification Rules. In: Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases (ECML-PKDD), 2020, Springer.

If you use the algorithm in a scientific publication, we would appreciate citations to the mentioned paper. An overview of publications that are concerned with the BOOMER algorithm, together with information on how to cite them, can be found in the section References of the documentation.

Features

The algorithm that is provided by this project currently supports the following core functionalities to learn an ensemble of boosted classification rules:

  • Different label-wise or example-wise loss functions can be minimized during training (optionally using L1 or L2 regularization).
  • The rules may predict for a single label or for all labels (which enables to model local label dependencies).
  • When learning a new rule, random samples of the training examples, features or labels may be used (including different techniques such as sampling with or without replacement or stratification methods).
  • The impact of individual rules on the ensemble can be controlled using shrinkage.
  • Hyper-parameters that provide fine-grained control over the specificity/generality of rules are available.
  • The conditions of rules can be pruned based on a hold-out set.
  • The algorithm can natively handle numerical, ordinal and nominal features (without the need for pre-processing techniques such as one-hot encoding).
  • The algorithm is able to deal with missing feature values, i.e., occurrences of NaN in the feature matrix.
  • Different strategies for prediction, which can be tailored to the used loss function, are available.

In addition, the following features that may speed up training or reduce the memory footprint are currently implemented:

  • Approximate methods for evaluating potential conditions of rules, based on unsupervised binning methods, can be used.
  • Gradient-based label binning (GBLB) can be used to assign the available labels to a limited number of bins. The use of label binning may speed up training significantly when using rules that predict for multiple labels to minimize a non-decomposable loss function.
  • Dense or sparse feature matrices can be used for training and prediction. The use of sparse matrices may speed up training significantly on some data sets.
  • Dense or sparse label matrices can be used for training. The use of sparse matrices may reduce the memory footprint in case of large data sets.
  • Dense or sparse matrices can be used to store predictions. The use of sparse matrices may reduce the memory footprint in case of large data sets.
  • Multi-threading can be used to parallelize the evaluation of a rule's potential refinements across multiple CPU cores.

Documentation

An extensive user guide, as well as an API documentation for developers, is available at https://mlrl-boomer.readthedocs.io. If you are new to the project, you probably want to read about the following topics:

A collection of benchmark datasets that are compatible with the algorithm are provided in a separate repository.

For an overview of changes and new features that have been included in past releases, please refer to the changelog.

License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

mlrl_boomer-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (755.5 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.7.1-cp310-cp310-macosx_10_9_x86_64.whl (660.3 kB view details)

Uploaded CPython 3.10 macOS 10.9+ x86-64

mlrl_boomer-0.7.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (755.5 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.7.1-cp39-cp39-macosx_10_9_x86_64.whl (660.3 kB view details)

Uploaded CPython 3.9 macOS 10.9+ x86-64

mlrl_boomer-0.7.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (755.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl (660.3 kB view details)

Uploaded CPython 3.8 macOS 10.9+ x86-64

mlrl_boomer-0.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (755.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

mlrl_boomer-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl (660.3 kB view details)

Uploaded CPython 3.7m macOS 10.9+ x86-64

File details

Details for the file mlrl_boomer-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.7.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b1ebe339c1f5a323c6376f1946295d8c6697e6ac6ff7507eba78d95143c557d4
MD5 80e10d98546e07b704feb2987cfd8793
BLAKE2b-256 71182e6ef0164e36f7d5e4a6e7f90e7f73774ae453f3c89191f94eaf5bede12e

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.7.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mlrl_boomer-0.7.1-cp310-cp310-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 660.3 kB
  • Tags: CPython 3.10, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for mlrl_boomer-0.7.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9fabc1e88b0bcecd05d3841b6dc7de7fb9cef375e965896942d4460877b8ddd7
MD5 2ef9b10a61edf9fb37071c5de9d4dffe
BLAKE2b-256 442bd2d9737cb7eb7c5916015aec6d7dfdb16ca933f9349d58716508e8e3175f

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.7.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.7.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9d899ba39de027cd9ecfd2b73039297de726dea359439f9e72b5baecf08d5f78
MD5 18f834a20db7cb3d784d3acb5fbcee7e
BLAKE2b-256 77ba96570a62d3a3589aeacb9e6265f9baec8bf7e49d7704baf580ad0b422b90

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.7.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mlrl_boomer-0.7.1-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 660.3 kB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for mlrl_boomer-0.7.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 44ec7cb728f5c2533d60d7f6b8c75b4797767eb2870ad3088ea8ba6e9a97a1a3
MD5 f35f504bbf2287f5609aed58b98397bd
BLAKE2b-256 192c83737856500d688fd7549b4617420c7e7a568f4db0db0a41072d87a9ac95

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.7.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.7.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 785928154ef310fccd563fb235721a8078622bb5bf21dbe8a35b29cf4fb40251
MD5 45782ef5fed94adcd8c17e6d70f80c8d
BLAKE2b-256 17b96f3efa9e0026cfc8d73d6ae495dca16942c08664d14cafd72aeeaf5d6ab9

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mlrl_boomer-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 660.3 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for mlrl_boomer-0.7.1-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1c7164c0c1639020da257b77981a87cff26ec46b6a5b15dada12ca4eeb4a1cb9
MD5 7787fd6af4fbc29f42391466158c2b63
BLAKE2b-256 b0b595788babe03212e5072bc7ea7342c024d96e4a7a351d2551a224e3379b38

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mlrl_boomer-0.7.1-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5b39413483b8224744adfde981435f6e9c9d0914c2395692b8c6aea46058ca24
MD5 30a2de1ff39e9885d9b593d2715db76a
BLAKE2b-256 0c3f3204156d5e4efa1b06e7f0bfa896867eb685b18a3fbb79eebb34b2364570

See more details on using hashes here.

File details

Details for the file mlrl_boomer-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: mlrl_boomer-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 660.3 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for mlrl_boomer-0.7.1-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 749fb1bac6c1417ae610b8330578255d9d46ce6c3c1521fb6c006f43d62b6b87
MD5 833dcac0a6adaac065dee33dbb86fa23
BLAKE2b-256 ce18e697d1143a93ec4e2c3d39970eb1d4755e76e922e69cacf42f218d92b72e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page