A framework to define a machine learning pipeline

These details have not been verified by PyPI

Project links

Homepage

Environment
- Console
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3
Topic
- Scientific/Engineering
- Scientific/Engineering :: Artificial Intelligence

Project description

Note that the this documentation is a tad bit outdated. Will be updating as soon as I can

ml-pipeline

I use this pipeline to simplify my life when working on ML projects.

Installation

This can be installed using pip

pip install mlpipeline

Usage (tl;dr version)

Extend mlpipeline.helper.Experiment and mlpipeline.helper.Dataloader to suit your needs.
Define the versions using the interface provided by mlpipeline.utils.Versions.
- Version parameters that must be defined:
  - mlpipeline.utils.version_parameters.NAME
  - mlpipeline.utils.version_parameters.DATALOADER
  - mlpipeline.utils.version_parameters.BATCH_SIZE
  - mlpipeline.utils.version_parameters.EPOC_COUNT
Place the script(s) containing above in a specified directory.
Add the directory to mlp.config
Add the name of the script to the experiments.config
(optional) Add the name of the script to the experiments_test.config
(optional) Run the experiment in test mode to ensure the safety of your sanity.

mlpipeline

Execute the pipeline

mlpipeline -r -u

Anything saved to the experiment_dir passed through the mlpipeline.utils.Experiment.train_loop and mlpipeline.utils.Experiment.evaluate_loop will be available to access. The output and logs can be found in outputs/log-<hostname> and outputs/output-<hostname> files relative to the directory in 3. above.

Usage (Long version)

Experiment scripts

The experiment script is a python script that contain a global variable EXPERIMENT which holds an mlpipeline.helper.Experiment object. Ideally, one would extend the mlpipeline.helper.Experiment class and implement it's methods to perform the intended tasks (Refer documentation in mlpipeline.helper for more details).

Place experiment scripts in a separate folder. Note that this folder can be anywhere in your system. Add the path to the folder in which the code is placed in the mlp.config file. The directory structure recommended to use in this case would be as follows:

/<project>
  /experiment
    <experimentscripts>
  mlp.config
  experiments.config
  experiments_test.config

The mlpipeline will be executed from the directory.

For example: A sample experiment can be seen in examples/sample-project/experiments/sample_experiment.py. The default mlp.config file has points to the experiments folder. The examples/sample-project/ is a sample directory structure for a project.

Versions (I should choose a better term for this)

mlpipeline.utils.version_parameters.NAME: This is a string used to keep track of the training and history and this name will be appended to the logs and outputs. This parameters must be set for each version.
mlpipeline.utils.version_parameters.DATALOADER: An mlpipeline.helper.DataLoader object. Simply put, it is a wrapper for a dataset. You'll have extend the mlpipeline.helper.DataLoader class to fit your needs. This object will be used by the pipeline to infer details about a training process, such as the number of steps (Refer documentation in mlpipeline.helper for more details). As of the current version of the pipeline, this parameter is mandatory.
mlpipeline.utils.version_parameters.EXPERIMENT_DIR_SUFFIX: Each version of the experiment that's completed the training loop will be allocated a directory which can be used to save outputs (e.g. checkpoint files). When a experiment is being trained with a different set of versions if allow_delete_experiment_dir is set to True in the EXPERIMENT, the directory will be cleared as defined in mlpipeline.helper.Experiment.clean_experiment_dir (Note that the behaviour of this function is not implemented by default to avoid a disaster). Some times you may want to have different directories to for each version of the experiment, in such a case, pass a string to this parameter, which will be appended to the directory name.
mlpipeline.utils.version_parameters.BATCH_SIZE: The batch size used in the experiment's training loop. As of the current version of the pipeline, this parameter is mandatory.
mlpipeline.utils.version_parameters.EPOC_COUNT: The number of epocs that will be used. As of the current version of the pipeline, this parameter is mandatory.
mlpipeline.utils.version_parameters.ORDER: This is set to ensure the versions are loaded in the order they are defined. This value can be passed to a version to override this behaviour.

Executing experiments

You can have any number of experiments in the experiments folder. Add the names of the scripts to the experiments.config file. If the use_blacklist is false, only the scripts whose names are under [WHITELISTED_EXPERIMENTS] will be executed. if it is set to true all scripts except the ones under the [BLACKLISTED_EXPERIMENTS] will be executed. Note that experiments can be added or removed (assuming it has not been executed) to the execution queue while the pipeline is running. That is after each experiment is executed, the pipeline will re-load the config file.

You can execute the pipeline by running the python script:

python pipeline.py

Note: this will run the pipeline in test mode (Read The two modes for more information)

Outputs

The outputs and logs will be saved in files in a folder named outputs in the experiments folder. There are two files the user would want to keep track of (note that the <hostname> is the host name of the system on which the pipeline is being executed):

log-<hostname>: This file contains the logs
output-<hostname>: This file contains the output results of each "incarnation" of a experiment.

Note that the other files are used by the pipeline to keep track of training sessions previously launched.

The two modes

The pipeline can be executed in two modes: test mode and execution mode. When you are developing a experiment, you'd want to use the test mode. The pipeline when executed without any additional arguments will be executed in the test mode. Note that the test mode uses it's own config file experiments_test.config, that functions similar to the experiments.config file. To execute in execution mode, pass -r to the above command:

python pipeline.py -r

Differences between test mode and execution mode (default behaviour):

Test mode	Execution mode
Uses `experiments_test.config`	Uses `experiments.config`
The experiment directory is a temporary directory which will be cleared each time the experiment is executed	The experiment directory is a directory defined by the name of the experiment and versions `EXPERIMENT_DIR_SUFFIX`
If an exception is raised, the pipeline will halt is execution by raising the exception to the top level	Any exception raised will not stop the pipeline, the error will be logged and the pipeline will continue process with other versions and experiments
No results or logs will be recorded in the output files	All logs and outputs will be recorded in the output files

Extra

I use an experiment log to maintain the experiments, which kinda ties into how I use the pipeline. For more info on that: Experiment log - How I keep track of my ML experiments

The practices I have grown to follow are described in this post: Practices I follow with the machine learning pipeline

Other projects that address similar problems (I'd be trying to combine them in the future iterations of the pipeline):

Project details

These details have not been verified by PyPI

Project links

Homepage

Environment
- Console
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3
Topic
- Scientific/Engineering
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

2.0a7.post1 pre-release

Dec 17, 2020

2.0a7 pre-release

Dec 13, 2020

2.0a6 pre-release

Apr 12, 2020

2.0a5 pre-release

Apr 12, 2020

2.0a4.post19 pre-release

Feb 20, 2020

2.0a4.post18 pre-release

Sep 16, 2019

2.0a4.post17 pre-release

Sep 12, 2019

2.0a4.post16 pre-release

Sep 2, 2019

2.0a4.post14 pre-release

Aug 15, 2019

2.0a4.post13 pre-release

Aug 15, 2019

2.0a4.post12 pre-release

Jul 30, 2019

2.0a4.post11 pre-release

Jul 28, 2019

2.0a4.post10 pre-release

Jul 28, 2019

2.0a4.post9 pre-release

Jul 28, 2019

2.0a4.post8 pre-release

Jul 28, 2019

2.0a4.post7 pre-release

Jul 28, 2019

2.0a4.post6 pre-release

Jul 27, 2019

2.0a4.post5 pre-release

Jul 25, 2019

2.0a4.post4 pre-release

Jul 25, 2019

2.0a4.post3 pre-release

Jul 24, 2019

2.0a4.post2 pre-release

Jul 24, 2019

2.0a4.post1 pre-release

Jul 24, 2019

2.0a4 pre-release

Jul 24, 2019

2.0a3.post2 pre-release

Jul 23, 2019

2.0a3.post1 pre-release

Jul 23, 2019

2.0a3 pre-release

Jul 15, 2019

2.0a2 pre-release

Jul 8, 2019

2.0a1 pre-release

Jun 18, 2019

1.1a3.post12 pre-release

Jun 12, 2019

This version

1.1a3.post11 pre-release

Jun 9, 2019

1.1a3.post10 pre-release

Jun 9, 2019

1.1a3.post9 pre-release

Apr 24, 2019

1.1a3.post8 pre-release

Mar 4, 2019

1.1a3.post7 pre-release

Jan 8, 2019

1.1a3.post6 pre-release

Dec 3, 2018

1.1a3.post5 pre-release

Dec 3, 2018

1.1a3.post4 pre-release

Dec 3, 2018

1.1a3.post3 pre-release

Nov 29, 2018

1.1a3.post2 pre-release

Nov 17, 2018

1.1a3.post1 pre-release

Nov 15, 2018

1.1a3 pre-release

Nov 14, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlpipeline-1.1a3.post11.tar.gz (22.5 kB view hashes)

Uploaded Jun 9, 2019 Source

Built Distribution

mlpipeline-1.1a3.post11-py3-none-any.whl (24.2 kB view hashes)

Uploaded Jun 9, 2019 Python 3

Hashes for mlpipeline-1.1a3.post11.tar.gz

Hashes for mlpipeline-1.1a3.post11.tar.gz
Algorithm	Hash digest
SHA256	`4654de5c655d6d293cdf9f1107c25bb15bd053349e7ab57d30c5e6aef5c2ddcf`
MD5	`bbc63af76f7b7d79e55e68684ffcd293`
BLAKE2b-256	`aee6200d424579884b35a7aaab35377d5faf18a4f129db3923cfd6afeca8fae2`

Hashes for mlpipeline-1.1a3.post11-py3-none-any.whl

Hashes for mlpipeline-1.1a3.post11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52852e12258e1db8e37d482a9af66fa9b25ac3033a920a4de12720dc9d6d5fad`
MD5	`12796a88dc87c685fc1020effa5d4672`
BLAKE2b-256	`d2895ca93fae06a6dad05e8d3ea6be0ac7e49e12a008a10454b13fab9fe047c4`