API supporting machine learning experiment tracking on kubernetes
Project description
# Experiments API
Training a deep neural network requires finding a good combination of model hyperparameters. The process of finding good values for each is called hyperparameter optimization. The number of jobs required for each such experiment typically ranges from the low ones into the hundreds.
Individual workflows for optimization vary, but this is typically an ad-hoc manual process including custom job submit scripts or even pen and paper.
This project provides an API to support machine learning experiments on Kubernetes. This is done by moving the experiment context into a shared API and standardizing experiment job metadata. This promotes sharing results and tool development. Decoupling parameter space search from job execution further promotes re-use. This project eases job integration with the experiment tracking system by providing a python client library.
[![overview figure](docs/images/overview.png)](https://docs.google.com/drawings/d/1CGDVt9finf_QC_H6lAIW9StmYiNOCLoemAmpNRN47tg/edit)
## Prerequisites
git
make
python
kubectl and a connected cluster (minikube or a full cluster)
## Installation
To install the most recent release, run the following: ` $ pip install experiments `
## Development
To check out and install the latest development release, run: ` $ git clone https://github.com/IntelAI/experiments.git $ cd experiments $ pip install . ` To test the Experiments API, run the following: ` $ pip install -r requirements_tests.txt $ make test `
## Appendix
### Concepts
Experiment Describes a hyperparameter space and how to launch a job for a sample in that space. Has a unique name.
Optimizer A program that reads an experiment and creates jobs with different hyperparameter settings. This can be done all in one shot, or the optimizer could be a long-running coordinator that monitors the performance of various samples to direct the hyperparameter optimization process. This program is supplied by the user.
Result Encodes metadata about a single job run for an experiment. For example, a handful of high level metrics per training epoch and a pointer to an output directory on shared storage. There is one result resource per job. Each result has the same name as the job it represents.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for experiments-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67e3229ceaf6479e2d9c6c72ef0259cf3e2a7cd316f1f1ab86a60d96756e4581 |
|
MD5 | 73c234f9078de035e19466bdad588d6e |
|
BLAKE2b-256 | 585fe5cae67190c6c8bb09b5fdd29434ea5caa540a89d428c8cdaa49f7b31f36 |