No project description provided
Project description
gaia python client
running
To connect to a running Gaia server, find the host (open an ssh tunnel to it if needed) and do the following:
import gaia
config = {'gaia_host': 'localhost:24442'}
flow = gaia.Gaia(config)
Now that we have a reference to the client, we can call these methods to operate on a named workflow:
- command - see what Commands are available and add new Commands
- merge - update or add new Steps
- run - recompute dependencies and run outstanding Steps
- halt - stop a running workflow
- status - find out all information about a given workflow
- expire - recompute the given storage keys and Steps and all their dependent Steps
To just get something going, run the workflow in WCM:
commands = gaia.load_yaml('../../resources/test/wcm/wcm.commands.yaml')
wcm = gaia.load_yaml('../../resources/test/wcm/wcm.processes.yaml')
flow.command('wcm', commands)
flow.merge('wcm', wcm)
You will also need to launch some sisyphus workers. To do that:
flow.launch(['a', 'b'])
Launch more if you want : ) Give each a unique name. They will deallocate 5 minutes after finishing their last Steps.
command
Commands are the base level operations that can be run, specifically: command line programs in a given docker container image. Once defined, a Command can be invoked any number of times with a new set of vars, inputs, and outputs.
If you call this method with an empty or absent array argument, it will return all Commands in the named workflow.
flow.command('biostream')
# [{'name': 'ls', 'image': 'ubuntu', ...}, ...]
A Command is expressed as a dictionary with the following keys:
- name - name of the Command
- image - docker image to run in
- command - array of shell tokens to execute
- inputs - map of storage keys to internal paths inside the docker container where the Command's input files will be placed
- outputs - map of storage keys to internal paths inside the docker container where the Command's output files will be retrieved after the Command has run
- vars - map of var keys to string values to insert into Command tokens
They may also have an optional stdout
key which specifies what path to place stdout output (so that stdout can be used as one of the outputs of the command).
flow.command('biostream', [...])
If flow.command()
is called with an array of Command entries it will merge the given Commands into the workflow, thus adding and/or replacing Commands and triggering the recomputation of any Steps that refer to these Commands.
merge
Once some Commands exist in the workflow you can start merging in Steps in order to trigger computation. Every Step names a Command and sets the Command's vars, inputs, and outputs. Inputs and outputs refer to paths in the data store while vars are strings that can be spliced into various parts of the Command's shell tokens.
Commands and Steps are kept in workflows which are entirely encapsulated from one another. Each workflow has its own data space with its own set of names and values.
To call the merge
method, provide a workflow name and an array of Steps:
flow.merge('biostream', [{'name': 'ls-home', 'command': 'ls', 'inputs': {...}, ...}, ...])
Each Step is a dictionary with the following keys:
- name - name of the Step
- command - name of the Command to invoke
- inputs - map of input keys defined by the Command to keys in the data store to read the input files
- outputs - map of output keys from the Command to keys in the data store to write the output files after successfully invoking the Command
- vars - map of var keys to values. If this is an array it will create a Step for each element in the array with the given value
If this is a Step with a name that hasn't been seen before, it will create the Step entry and trigger the computation of outputs if the required inputs are available in the data store. If the key
of the Step being merged already exists in the workflow, that Step will be updated and recomputed, along with all Steps that depend on outputs from the updated Step in that workflow.
run
The run
method simply triggers the computation in the provided workflow if it is not already running:
flow.run('biostream')
halt
The 'halt' method is the inverse of the 'run' method. It will immediately cancel all running tasks and stop the computation in the given workflow:
flow.halt('biostream')
status
The status
method provides information about a given workflow. There is a lot of information available, and it is formatted as a dictionary with these keys:
- state - a string representing the state of the overall workflow. Possible values are 'initialized', 'running', 'complete', 'halted', and 'error'.
- flow - contains a representation of the Steps in the workflow as a bipartite graph:
step
anddata
. Each entry has afrom
field containing Step or data names it is dependent on and ato
field containing all Step or data names dependent on it. - data - contains a map of data keys to their current status: either missing or complete
- tasks - contains information about each task run through the configured executor. This will largely be executor dependent
flow.status('biostream')
expire
The expire
method accepts a workflow and a list of Steps names and data names (storage keys). It makes those Steps and dependent Steps have to run again.
flow.expire('biostream', ['ls-home', 'genomes', ...])
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gaia-0.0.7.tar.gz
.
File metadata
- Download URL: gaia-0.0.7.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e8ed77818d5fb6517023bbf41b0f9df1336789b754a2363c83cdb64062c111a |
|
MD5 | e77fbfe09772cff03123949c8ab60f76 |
|
BLAKE2b-256 | e90bbcb9cf60ce0de4fa90fb7243d31a3d0bcf5f2fa8b2ce347c2bb65bde541b |
File details
Details for the file gaia-0.0.7-py2-none-any.whl
.
File metadata
- Download URL: gaia-0.0.7-py2-none-any.whl
- Upload date:
- Size: 5.3 kB
- Tags: Python 2
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/2.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e3d0b24e918d8553d74ebe748a79dfa27f3bc2efbad2c20119e59e19e6db274 |
|
MD5 | 19f7e29e9100c4bfd7072323f3e78015 |
|
BLAKE2b-256 | 7ef55ebe1f1f63e87cea4f432767d0c1238540b79bb548dd2e6d95777c164558 |