Provides utilities for the training and evaluation of machine learning algorithms
Project description
🔗 Important links: Documentation | Issue Tracker | Changelog | License
This software package provides mlrl-testbed - a command line utility for running machine learning experiments. It implements a straightforward, easily configurable, and extensible workflow for conducting experiments, including steps such as (but not restricted to) the following:
- loading a dataset
- splitting it into training and test sets
- training one or several models
- evaluating the models' performance
- saving experimental results to output files
MLRL-Testbed
On its own, this package is not very powerful. It is intended as a basis for other packages that build functionality upon it. In fact, it does not make any assumptions about the problem domain or type of machine learning algorithm that should be used in an experiment. Instead, implementations of domain- or algorithm-specific functionality are provided by the extensions discussed below.
Tabular Machine Learning
The package mlrl-testbed-sklearn adds support for tabular machine learning problems by making use of the scikit-learn framework. It can easily be installed via the following command (and will pull mlrl-testbed as a dependency):
pip install mlrl-testbed-sklearn
Optionally, support for the Slurm Workload Manager can be installed via the package mlrl-testbed-slurm.
💡 Example
By writing just a small amount of code, any scikit-learn compatible estimator can be integrated with MLRL-Testbed and used in experiments. For example, the following code integrates scikit-learn's RandomForestClassifier:
from argparse import Namespace
from mlrl.testbed_sklearn.runnables import SkLearnRunnable
from mlrl.util.cli import Argument, IntArgument
from sklearn.ensemble import RandomForestClassifier
from sklearn.base import ClassifierMixin, RegressorMixin
from typing import Optional, Set
class Runnable(SkLearnRunnable):
N_ESTIMATORS = IntArgument(
'--n-estimators',
description='The number of trees in the forest',
default=100,
)
def get_algorithmic_arguments(self, known_args: Namespace) -> Set[Argument]:
return { self.N_ESTIMATORS }
def create_classifier(self, args: Namespace) -> Optional[ClassifierMixin]:
return RandomForestClassifier()
def create_regressor(self, args: Namespace) -> Optional[RegressorMixin]:
return None # Not needed in this case
The previously integrated algorithm can then be used in experiments controlled via a command line API. Assuming that the source code shown above is saved to a file named custom_runnable.py in the working directory, we are now capable of fitting a RandomForestClassifier to a dataset by using the command below.
mlrl-testbed custom_runnable.py \
--data-dir path/to/datasets/ \
--dataset dataset-name \
--n-estimators 50
The above command does not only train a model, but also evaluates it according to common measures and prints the evaluation results. It does also demonstrate how algorithmic parameters can be controlled via command line arguments.
It is also possible to run multiple experiments at once by defining the datasets and algorithmic parameters to be used in the different runs in a YAML file:
mlrl-testbed custom_runnable.py --mode batch --config path/to/config.yaml
An exemplary YAML file is shown below. Each combination of the specified parameter values is applied to each dataset defined in the file.
datasets:
- directory: path/to/datasets/
names:
- first-dataset
- second-dataset
parameters:
- name: --n-estimators
values:
- 50
- 100
🏁 Advantages
Making use of MLRL-Testbed does not only help with the burdens of training and evaluating machine learning models, it can also help making your own methods and algorithms more accessible to users. This is demonstrated by the rule learning algorithms mlrl-boomer and mlrl-seco that can easily be run via the command line API described above and even extend it with rule-specific functionalities.
🔧 Functionalities
The package mlrl-testbed-sklearn provides a command line API that allows configuring and running machine learning algorithms. It allows to apply machine learning algorithms to different datasets and can evaluate their predictive performance in terms of commonly used measures. In detail, it supports the following functionalities:
- Single- and multi-output datasets in the Mulan and MEKA format are supported (with the help of the package mlrl-testbed-arff).
- Datasets can automatically be split into training and test data, including the possibility to use cross validation. Alternatively, predefined splits can be provided as separate files.
- One-hot-encoding can be applied to nominal or binary features.
- Binary predictions, scores, or probability estimates can be obtained from machine learning algorithms, if supported. Evaluation measures that are suited for the respective type of predictions are picked automatically.
Furthermore, the command line API provides many options for controlling the experimental results to be gathered during an experiment. Depending on the configuration, the following experimental results can be saved to output files or printed on the console:
- Evaluation scores according to commonly used measures
- Characteristics, i.e., statistical properties, of datasets
- Predictions and their characteristics
- Unique label vectors contained in a classification dataset
If the following are written to output files, they can be loaded and reused in future experiments:
- The machine learning models that have been learned
- Algorithmic parameters used for training
📚 Documentation
Our documentation provides an extensive user guide, as well as an API reference for developers.
- Examples of how to save experimental results to output files.
- Instructions for using your own algorithms with the command line API.
- An overview of available command line arguments for controlling experiments.
For an overview of changes and new features that have been included in past releases, please refer to the changelog.
📜 License
This project is open source software licensed under the terms of the MIT license. We welcome contributions to the project to enhance its functionality and make it more accessible to a broader audience. A frequently updated list of contributors is available here.
All contributions to the project and discussions on the issue tracker are expected to follow the code of conduct.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlrl_testbed-0.14.0-py3-none-any.whl.
File metadata
- Download URL: mlrl_testbed-0.14.0-py3-none-any.whl
- Upload date:
- Size: 80.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca2a5882acbc36f331a79ae2813721f11e3c2997238a18d6aacbec046e054915
|
|
| MD5 |
91cb5f2c362c78d6b58fcfa443655609
|
|
| BLAKE2b-256 |
7663cc492a306dd308cf7caac9f3ba2383ed55518402241070ee153d9b49297f
|
Provenance
The following attestation bundles were made for mlrl_testbed-0.14.0-py3-none-any.whl:
Publisher:
publish.yml on mrapp-ke/MLRL-Boomer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlrl_testbed-0.14.0-py3-none-any.whl -
Subject digest:
ca2a5882acbc36f331a79ae2813721f11e3c2997238a18d6aacbec046e054915 - Sigstore transparency entry: 423627196
- Sigstore integration time:
-
Permalink:
mrapp-ke/MLRL-Boomer@178ea7ece9cd77ed4991720ec6adbc79bd574daf -
Branch / Tag:
refs/tags/0.14.0 - Owner: https://github.com/mrapp-ke
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@178ea7ece9cd77ed4991720ec6adbc79bd574daf -
Trigger Event:
release
-
Statement type: