Provides common modules to be used by different types of multi-label rule learning algorithms
Project description
"MLRL-Common": Building-Blocks for Multi-label Rule Learning Algorithms
This software package provides common modules to be used by different types of multi-label rule learning (MLRL) algorithms that integrate with the popular scikit-learn machine learning framework.
The goal of multi-label classification is the automatic assignment of sets of labels to individual data points, for example, the annotation of text documents with topics.
The library serves as the basis for the implementation of the following rule learning algorithms:
- BOOMER (Gradient Boosted Multi-label Classification Rules): A state-of-the art algorithm that uses gradient boosting to learn an ensemble of rules that is built with respect to a given multivariate loss function.
Features
This package follows a unified and modular framework for the implementation of different types of MLRL algorithms. An instantiation of the framework consists of the following modules:
- A module for rule induction that is responsible for the construction of individual rules. Each rule consists of a body and a head. The former specifies the region of the input space to which the rule applies. The latter provides predictions for one or several labels.
- A strategy for the assemblage of a rule model that consists of several rules.
- A notion of (label space) statistics that serve as the basis for assessing the quality of potential rules and determining their predictions.
- Implementations of pruning techniques that can optionally be applied to a rule after its construction to improve the generalization to unseen data.
- Post-processing techniques that may alter the predictions of a rule after it has been learned.
- One or several stopping criteria that are used to decide whether more rules should be added to a model.
- Optional sampling techniques that may be used to obtain a subset of the available training examples, features or labels.
- An algorithm for the aggregation of predictions that are provided by the rules in a model for previously unseen test examples.
This library defines APIs for all the aforementioned modules and provides default implementations for the following ones:
- Top-down hill climbing for the greedy induction of rules. It supports numerical, ordinal and nominal features, as well as missing feature values. Optionally, a histogram-based algorithm, where training examples with similar feature values are assigned to bins, can be used to reduce the complexity of training. Both types of algorithms support the use of multi-threading.
- A strategy for the sequential assemblage of rule models, where one rule is learned after the other.
- Incremental reduced error pruning (IREP), where conditions are removed from a rule's body if this results in increased performance as measured on a holdout set of the training data.
- Simple stopping criteria that stop the induction of rules after a certain amount of time or when a predefined number of rules has been reached, as well as an early stopping mechanism that allows to terminate training as soon as the performance of a model on a holdout set stagnates or declines.
- Methods for sampling with or without replacement, as well as stratified sampling techniques.
Furthermore, the library provides classes for the representation of individual rules, as well as dense and sparse data structures that may be used to store the feature values and ground truth labels of training and test examples.
License
This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file mlrl_common-0.7.0-cp39-cp39-manylinux2014_x86_64.whl
.
File metadata
- Download URL: mlrl_common-0.7.0-cp39-cp39-manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.9
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 728da36280558eb41ec504cc013df5b36621c83163244d74ff7fdb1fd10f33c7 |
|
MD5 | e6a5d5af3878cb7895088a9c8be6a5cf |
|
BLAKE2b-256 | f71d4512ae37b16b1722e0883b375ed758ece50358dbe78649d490af54f3e39e |
File details
Details for the file mlrl_common-0.7.0-cp38-cp38-manylinux2014_x86_64.whl
.
File metadata
- Download URL: mlrl_common-0.7.0-cp38-cp38-manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bec42437ee85e5fb5b601c2ee1baad85119553a5acb4ca5f41f84ac5f9989f2 |
|
MD5 | aa641f2d29ef5c08922d16b9dc712415 |
|
BLAKE2b-256 | 9f485ccbf22ce15bc1cb1532cacf8e137481941bb5639a6fbe6df6ed7e7e9eda |
File details
Details for the file mlrl_common-0.7.0-cp37-cp37m-manylinux2014_x86_64.whl
.
File metadata
- Download URL: mlrl_common-0.7.0-cp37-cp37m-manylinux2014_x86_64.whl
- Upload date:
- Size: 1.4 MB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.0 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2460e8e38b014738f53c60985bda6f7efc7f41f43ae78169f30ec0867254772e |
|
MD5 | 5a0b962ad24509d4a3b7aea4f03ad679 |
|
BLAKE2b-256 | e35aebf21fd68f36d7f38c9f11d70375b338d9fa8b52aeba8149db2bbbc6ff1a |