Skip to main content

Provides common modules to be used by different types of multi-label rule learning algorithms

Project description

"MLRL-Common": Building-Blocks for Multi-label Rule Learning Algorithms

License: MIT Documentation Status

This software package provides common modules to be used by different types of multi-label rule learning (MLRL) algorithms that integrate with the popular scikit-learn machine learning framework.

The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.

The library serves as the basis for the implementation of the following rule learning algorithms:

  • BOOMER (Gradient Boosted Multi-label Classification Rules): A state-of-the art algorithm that uses gradient boosting to learn an ensemble of rules that is built with respect to a given multivariate loss function.

Features

This package follows a unified and modular framework for the implementation of different types of MLRL algorithms. An instantiation of the framework consists of the following modules:

  • A module for rule induction that is responsible for the construction of individual rules. Each rule consists of a body and a head. The former specifies the region of the input space to which the rule applies. The latter provides predictions for one or several labels.
  • A strategy for the assemblage of a rule model that consists of several rules.
  • A notion of (label space) statistics that serve as the basis for assessing the quality of potential rules and determining their predictions.
  • Implementations of pruning techniques that can optionally be applied to a rule after its construction to improve the generalization to unseen data.
  • Post-processing techniques that may alter the predictions of a rule after it has been learned.
  • One or several stopping criteria that are used to decide whether more rules should be added to a model.
  • Optional sampling techniques that may be used to obtain a subset of the available training examples, features or labels.
  • An algorithm for the aggregation of predictions that are provided by the rules in a model for previously unseen test examples.

This library defines APIs for all the aforementioned modules and provides default implementations for the following ones:

  • Top-down hill climbing for the greedy induction of rules. It supports numerical, ordinal and nominal features, as well as missing feature values. Optionally, a histogram-based algorithm, where training examples with similar feature values are assigned to bins, can be used to reduce the complexity of training. Both types of algorithms support the use of multi-threading.
  • A strategy for the sequential assemblage of rule models, where one rule is learned after the other.
  • Incremental reduced error pruning (IREP), where conditions are removed from a rule's body if this results in increased performance as measured on a holdout set of the training data.
  • Simple stopping criteria that stop the induction of rules after a certain amount of time or when a predefined number of rules has been reached, as well as an early stopping mechanism that allows to terminate training as soon as the performance of a model on a holdout set stagnates or declines.
  • Methods for sampling with or without replacement, as well as stratified sampling techniques.

Furthermore, the library provides classes for the representation of individual rules, as well as dense and sparse data structures that may be used to store the feature values and ground truth labels of training and test examples.

License

This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

File details

Details for the file mlrl_common-0.7.0-cp39-cp39-manylinux2014_x86_64.whl.

File metadata

  • Download URL: mlrl_common-0.7.0-cp39-cp39-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.9
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for mlrl_common-0.7.0-cp39-cp39-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 728da36280558eb41ec504cc013df5b36621c83163244d74ff7fdb1fd10f33c7
MD5 e6a5d5af3878cb7895088a9c8be6a5cf
BLAKE2b-256 f71d4512ae37b16b1722e0883b375ed758ece50358dbe78649d490af54f3e39e

See more details on using hashes here.

File details

Details for the file mlrl_common-0.7.0-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

  • Download URL: mlrl_common-0.7.0-cp38-cp38-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for mlrl_common-0.7.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7bec42437ee85e5fb5b601c2ee1baad85119553a5acb4ca5f41f84ac5f9989f2
MD5 aa641f2d29ef5c08922d16b9dc712415
BLAKE2b-256 9f485ccbf22ce15bc1cb1532cacf8e137481941bb5639a6fbe6df6ed7e7e9eda

See more details on using hashes here.

File details

Details for the file mlrl_common-0.7.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

  • Download URL: mlrl_common-0.7.0-cp37-cp37m-manylinux2014_x86_64.whl
  • Upload date:
  • Size: 1.4 MB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9

File hashes

Hashes for mlrl_common-0.7.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2460e8e38b014738f53c60985bda6f7efc7f41f43ae78169f30ec0867254772e
MD5 5a0b962ad24509d4a3b7aea4f03ad679
BLAKE2b-256 e35aebf21fd68f36d7f38c9f11d70375b338d9fa8b52aeba8149db2bbbc6ff1a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page