Skip to main content

A Python library for running computationally expensive experiments

Project description

PyPI Python versions DOI

MEMENTO

MEMENTO is a Python library for running computationally expensive experiments.

Running complex sets of machine learning experiments is challenging and time-consuming due to the lack of a unified framework. This leaves researchers forced to spend time implementing necessary features such as parallelization, caching, and checkpointing themselves instead of focussing on their project. To simplify the process, we introduce MEMENTO, a Python package that is designed to aid researchers and data scientists in the efficient management and execution of computationally intensive experiments. MEMENTO has the capacity to streamline any experimental pipeline by providing a straightforward configuration matrix and the ability to concurrently run experiments across multiple threads.

If you need to run a large number of time-consuming experiments MEMENTO can help:

  • Structure your configuration
  • Parallelize experiments across CPUs
  • Save and restore results
  • Checkpoint in-progress experiments
  • Send notifications when experiments fail or finish

Demo video

Getting Started

MEMENTO is officially available on PyPl. To install the package:

Install

pip install memento-ml

The Configuration Matrix

The core of MEMENTO is a configuration matrix that describes the list of experiments you want MEMENTO to run. This must contain a key parameters which is itself a dict, this describes each paramter you want to vary for your experiments and their values.

As an example let's say you wanted to test a few simple linear classifiers on a number of image recognition datasets. You might write something like this:

Don't worry if you're not working on machine learning, this is just an example.

matrix = {
  "parameters": {
    "model": [
      sklearn.svm.SVC,
      sklearn.linear_model.Perceptron,
      sklearn.linear_model.LogisticRegression
    ],
    "dataset": ["imagenet", "mnist", "cifar10", "quickdraw"]
  }
}

MEMENTO would then generate 12 configurations by taking the cartesian product of the parameters.

Frequently you might also want to set some global configuration values, such as a regularization parameter or potentially even change your preprocessing pipeline. In this case MEMENTO also accepts a "settings" key. These settings apply to all experiments and can be accessed from the configuration list as well as individual configurations.

matrix = {
  "parameters": ...,
  "settings": {
    "regularization": 1e-1,
    "preprocessing": make_preprocessing_pipeline()
  }
}

You can also exclude specific parameter configurations. Returning to our machine learning example, if you know SVCs perform poorly on cifar10 you might decide to skip that experiment entirely. This is done with the "exclude" key:

matrix = {
  "parameters": ...,
  "exclude": [
    {"model": sklearn.svm.SVC, "dataset": "cifar10"}
  ]
}

Running an experiment

Along with a configuration matrix you need some code to run your experiments. This can be any Callable such as a function, lambda, class, or class method.

from memento import Memento, Config, Context

def experiment(context: Context, config: Config):
  classifier = config.model()
  dataset = fetch_dataset(config.dataset)

  classifier.fit(*dataset)

  return classifier

Memento(experiment).run(matrix)

You can also perform a dry run to check you've gotten the matrix correct.

Memento(experiment).run(matrix, dry_run=True)
Running configurations:
  {'model': sklearn.svm.SVC, 'dataset': 'imagenet'}
  {'model': sklearn.svm.SVC, 'dataset': 'mnist'}
  {'model': sklearn.svm.SVC, 'dataset': 'cifar10'}
  {'model': sklearn.svm.SVC, 'dataset': 'quickdraw'}
  {'model': sklearn.linear_model.Perceptron, 'dataset': 'imagenet'}
  ...
Exiting due to dry run

Code demo

  • Code demo can be found here.
  • MEMENTO does not depend on any ML packages, e.g., scikit-learn. The scikit-learn and jupyterlab packages are required to run the demo (./demo/*).
pip install memento-ml scikit-learn jupyterlab

Cite

If you find MEMENTO useful and use it in your research, please cite

Memento: Facilitating Effortless, Efficient, and Reliable ML Experiments - Z Pullar-Strecker, X Chang, L Brydon, I Ziogas, K Dost, J Wicker - Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2023 - Springer - https://link.springer.com/chapter/10.1007/978-3-031-43430-3_21

Roadmap

  • Finish HPC support
  • Improve result serialisation
  • Improve customization for notification

Contributors

License

MEMENTO is licensed under the 3-Clause BSD License license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memento_ml-1.2.0.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

memento_ml-1.2.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file memento_ml-1.2.0.tar.gz.

File metadata

  • Download URL: memento_ml-1.2.0.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for memento_ml-1.2.0.tar.gz
Algorithm Hash digest
SHA256 6ad579f831acc4a656c7fd316c892f4aa15708a4a7789cd280da6511f8aa436f
MD5 bba8af30cf5f88d0f134ef42db317e6a
BLAKE2b-256 52e0cac895c320758ca1684c989453591447458e32877c1da88750a381c6722e

See more details on using hashes here.

File details

Details for the file memento_ml-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: memento_ml-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for memento_ml-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 030f155a27d0e8ea21c8d6ef662ee4b3832a563ebe7f7a3de458ce54a4850daa
MD5 7c4631960a890ab84cd81a0e4fc2d6a5
BLAKE2b-256 1ea510530318b39e5bcdad7ca66b5c7ea33c047085a373f55d2dfd8386bbe3f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page