Skip to main content

Simple Orchestration for running a set of jobs each of which consist of a set of tasks.

Project description

#Job Orchestration

What is the point in this package?

Whilst running long experiments for https://aclanthology.org/2021.eacl-main.219/ There was a number of characteristics / pain points that felt unaddressed by the current open source packages.

  • I want to run large number of experiments that are different in some small number of ways that can be controlled via config.
    • I want to be able to set of a list of experiments that should be run and not have to wait for experiment X to finish before I kick off experiment Y
    • I want to be able to run these in parallel when appropriate.
    • I don't want the failure of one experiment to effect another experiment.
    • These experiments naturally broke down into a set of atomic tasks and when the 4th Task Failed I wanted to be able to not have to repeat the first 3 experiments.
  • If I get a future result that was surprising, I want to be able to go back to an earlier experiment and
    • Look at the raw output including logs from that experiment.
    • Rerun in exactly the same setup as before (including being able to revert my local version of the code base to that point in time).
    • Be able to compare what has changed in the code base between the two runs.
    • If there was a train model as part of this experiment I wanted this to still be present.
  • While the experiments are running I want an easy centralised place that I can see progress some estimation of how long was left.
  • I want validation to run prior to the experiment running. For example, I didn't want to train a model for X hours only to find that the result didn't save due to a missing path.

This package is an attempt to provide a basic framework to start to address these pain points. It was also important that the package was Lightweight and didn't increase the run time significantly. Also while it was accepted that this package would require some level opinionation to achieve these goals there was an aim to under index on this for the main feature of the package where possible. Finally, we only considered experiments that could be run on a single machine scaling out to multiple machines can be achieved by just partitioning the config files.

Get Started:

Download the package by running the following commands. pip install job-orchestration

The first thing that you will need to do is set the JOB_ORCHESTRATION_WORKSPACE environment variable to tell the module where you will write logs / output / etc too.

Interface:

To enable this module to run your code we require that you follow the convention that you have a file called Tasks which contains a class that implements the TaskBase class. So the class needs a constructor which takes in the config dictionary and provides a run method to do the actual work. Optionally, you can also provide a validate method (Recommended that you do ).

Config Files:

To control the execution of your program you will need to provide one or more yaml config files. As a minimum we require that you provide us with the following fields: githubRepository: Url of the git repo that your project is under. pathToTasks: This is the parent file of the Tasks.py file. outputDir: Path to write the output too it - this will be a sub folder of the Output Directory. tasks: A list of the tasks that need to be performed by the library. These must contain at least: - id - must be unique in the set of tasks in this config file. - method - The name of the class in the Tasks file to execute.

These fields plus any others that are present will be passed through to as a dictionary object to the task constructor and so can be used to control execution there.

Now the next question is where do I put this config? You must put it in the "ConfigSources" sub-directory under the path given by theJOB_ORCHESTRATION_WORKSPACE environment variable. (Pro-tip once you have set the environment variable just run python -m job_orchestration to create all the subdirectories).

Now we need to ready this config file to do this run python -m job_orchestration -action readyConfigs - this will validate and setup the output location for any configs in the ConfigSources directory before moving them to the ConfigsToRun directory.

Finally, we need to run a worker which will run the tasks one by one in each of the yaml files. The worker will run and pick up a config file from "ConfigsToRun" directory and attempt to run it and repeat. python -m job_orchestration -action readyConfigs

For more clarity please see the example Tasks.py and config file in the /demo folder:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

job_orchestration-0.0.7.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

job_orchestration-0.0.7-py3-none-any.whl (21.1 kB view details)

Uploaded Python 3

File details

Details for the file job_orchestration-0.0.7.tar.gz.

File metadata

  • Download URL: job_orchestration-0.0.7.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.7.7

File hashes

Hashes for job_orchestration-0.0.7.tar.gz
Algorithm Hash digest
SHA256 f83ef940f0b643a64d5e06ad06dd37bce4724050267d9cb484106f630957481e
MD5 d045f52c20e6fad5c086642851c32587
BLAKE2b-256 236a476b40b182e35480ee4736759cfbf0ec9dc6ecf3de4420c0e318d4df1b80

See more details on using hashes here.

File details

Details for the file job_orchestration-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for job_orchestration-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 c389325baafedac7ec8c14574dce40fe7b140ca203f0f1619cb4f9ed115743fc
MD5 16c1f1ca5e08586e4132deaede34395a
BLAKE2b-256 9e5ccca55f8762c4d0fb67c25d637691a65c6799ce5ce68afd4a166b4f2819f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page