Machine Learning Experiment Toolbox
Project description
Coming up with the right research hypotheses is hard - testing them should be easy.
ML researchers need to coordinate different types of experiments on separate remote resources. The Machine Learning Experiment (MLE)-Toolbox is designed to facilitate the workflow by providing a simple interface, standardized logging, many common ML experiment types (multi-seed/configurations, grid-searches and hyperparameter optimization pipelines). You can run experiments on your local machine, on Slurm and Sun Grid Engine clusters as well as Google Cloud compute instances. The results are archived (locally/GCS bucket) and can easily retrieved or reported in .md
/.html
file.
Here are 4 steps to get started with running your distributed jobs:
- Follow the instructions below to install the
mle-toolbox
and to set up your credentials/configurations. - Read the documentation explaining the pillars of the toolbox & how to compose the meta-configuration job
.yaml
files for your experiments. - Check out the examples :notebook: to get started with a toy ODE integration, training PyTorch MNIST-CNNs or VAEs in JAX.
- Start up your own experiments using the template files.
Installing mle_toolbox
& dependencies
If you want to use the toolbox on your local machine follow the instructions locally. Otherwise do so on your respective remote resource (Slurm, SGE, or GCP). A simple PyPI installation can be done via:
pip install mle-toolbox
Alternatively, you can clone this repository and afterwards 'manually' install the toolbox (preferably in a clean Python 3.6 environment):
git clone https://github.com/RobertTLange/mle-toolbox.git
cd mle-toolbox
pip install -e .
This will install all required dependencies. Please note that the toolbox is tested only for Python 3.6.
Setting up Remote Credentials & Toolbox Config
By default the toolbox will only run locally and without any GCS storage of your experiments. If you want to integrate the mle-toolbox
with your remote resources, please edit the template_config.toml
template. This consists of 4 optional steps:
- Set whether or not you want to store all results and your database locally or remote in the Google Cloud Storage bucket.
- Add the Slurm credentials as well as cluster-specific details (headnode names, partitions, proxy server for internet) and default job arguments.
- Add the SGE credentials as well as cluster-specific details (headnode names, queues, proxy server for internet) and default job arguments.
- Add the path to your GCP credentials
.json
file as well as project and GCS bucket name to store your experiment data (as well as protocol database).
Afterwards, please move and rename the template to the home directory directory as mle_config.toml
.
mv templates/template_config.toml ~/mle_config.toml
Note: If you only intend to use a single resource, then simply only update the configuration for that resource.
The 4 Core Commands of the Toolbox
You are now ready to dive deeper into the specifics of job configuration and can start running your first experiments from the cluster (or locally on your machine) with the following commands:
- Start up an experiment:
mle run <experiment_config>.yaml
- Monitor resource utilisation:
mle monitor
- Retrieve the experiment results:
mle retrieve
- Create an experiment report with figures:
mle report
Examples & Getting Your First Job Running
- :notebook: Euler ODE - Integrate a simple ODE using forward Euler & get to know the toolbox.
- :notebook: MNIST CNN - Train CNNs on multiple random seeds & different training configs.
- :notebook: JAX VAE - Search through the hyperparameter space of a MNIST VAE.
- :notebook: Sklearn SVM - Train a SVM classifier to classify low-dimensional digits.
Notes, Development & Questions
- If you find a bug or would like to see a feature implemented, feel free to contact me @RobertTLange or create an issue :hugs:
- You can run all unit/integration tests from
mle-toolbox/
withpytest
(run locally & remote).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mle_toolbox-0.2.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8493b6f9bca4668d766aafa0600c25275071be669bd80d81273d5cb57b22985 |
|
MD5 | bea39303e854d3f2df5d71479dbdab8f |
|
BLAKE2b-256 | f63d12ba1db6151a3d55f476b305fae3b5735bc4b9c7bcb43476686cd38fa1a8 |