A Python library for running computationally expensive experiments
Project description
Memento
Memento is a Python library for running computationally expensive experiments.
If you need to run a large number of time-consuming experiments Memento can help:
- Structure your configuration
- Parallelize experiments across CPUs
- Save and restore results
- Checkpoint in-progress experiments
- Send notifications when experiments fail or finish
Getting Started
Install
pip install memento-ml
The Configuration Matrix
The core of Memento is a configuration matrix
that describes the list of experiments you
want Memento to run. This must contain a key parameters
which is itself a dict, this describes
each paramter you want to vary for your experiments and their values.
As an example let's say you wanted to test a few simple linear classifiers on a number of image recognition datasets. You might write something like this:
Don't worry if you're not working on machine learning, this is just an example.
matrix = {
"parameters": {
"model": [
sklearn.svm.SVC,
sklearn.linear_model.Perceptron,
sklearn.linear_model.LogisticRegression
],
"dataset": ["imagenet", "mnist", "cifar10", "quickdraw"]
}
}
Memento would then generate 12 configurations by taking the cartesian product of the parameters.
Frequently you might also want to set some global configuration values, such as a regularization parameter or potentially even change your preprocessing pipeline. In this case Memento also accepts a "settings" key. These settings apply to all experiments and can be accessed from the configuration list as well as individual configurations.
matrix = {
"parameters": ...,
"settings": {
"regularization": 1e-1,
"preprocessing": make_preprocessing_pipeline()
}
}
You can also exclude specific parameter configurations. Returning to our machine learning example, if you know SVCs perform poorly on cifar10 you might decide to skip that experiment entirely. This is done with the "exclude" key:
matrix = {
"parameters": ...,
"exclude": [
{"model": sklearn.svm.SVC, "dataset": "cifar10"}
]
}
Running an experiment
Along with a configuration matrix you need some code to run your experiments. This can be any
Callable
such as a function, lambda, class, or class method.
from memento import Memento, Config, Context
def experiment(context: Context, config: Config):
classifier = config.model()
dataset = fetch_dataset(config.dataset)
classifier.fit(*dataset)
return classifier
Memento(experiment).run(matrix)
You can also perform a dry run to check you've gotten the matrix correct.
Memento(experiment).run(matrix, dry_run=True)
Running configurations:
{'model': sklearn.svm.SVC, 'dataset': 'imagenet'}
{'model': sklearn.svm.SVC, 'dataset': 'mnist'}
{'model': sklearn.svm.SVC, 'dataset': 'cifar10'}
{'model': sklearn.svm.SVC, 'dataset': 'quickdraw'}
{'model': sklearn.linear_model.Perceptron, 'dataset': 'imagenet'}
...
Exiting due to dry run
Code demo
- Code demo can be found here.
Memento
does not depend onscikit-learn
. Thescikit-learn
andjupyterlab
packages are required to run the demo (./demo/*
).
pip install scikit-learn jupyterlab
Developing
Install as local package in Editable mode
pip install -e .
Install development dependencies
pip install memento-ml[dev]
Tests
pytest
Alternatively to only run a subset of tests that haven't been marked as time consuming/slow you can use:
pytest -m "not slow"
Linters
pylint memento
Format code
black .
Build Documentation
sphinx-apidoc -o docs memento -f
sphinx-build -W -b html docs docs/_build
Bump up version
# The `--dry` flag is for testing only. Remove `--dry` to update the version number.
# Use `minor` instead of `patch` for feature updates.
bumpver update --patch --dry
Run CI locally
Install act, then:
act
Roadmap
- Finish HPC support
- Improve result serialisation
- Production testing & fleshed-out integration test suite
Contributors
- Zac Pullar-Strecker
- Feras Albaroudi
- Liam Scott-Russell
- Joshua de Wet
- Nipun Jasti
- James Lamberton
- Joerg Wicker
- Xinglong (Luke) Chang
License
Memento is licensed under the 3-Clause BSD License license.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for memento_ml-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9128ee5534ffc6b71c54ba3ae7fb202ff22d867cdbfec51a6395872c511b8e48 |
|
MD5 | d28db9af7628c18916b66e1499a377d4 |
|
BLAKE2b-256 | 9f72fef526efe7d2ab7eeabbfaa8b83f71d85643429283173b1458092124d34c |