Skip to main content

No project description provided

Project description

gaia python client

running

To connect to a running Gaia instance, find the host (or open an ssh tunnel to it) and do the following:

import gaia
config = {'gaia_host': 'localhost:24442'}
flow = gaia.Gaia(config)

Now that we have a reference to the client, there are several methods we can call.

  • command - see what commands are available and add new commands
  • merge - update or add new processes into the given namespace
  • trigger - recompute dependencies and launch outstanding processes in a namespace
  • halt - stop a running namespace
  • status - find out all information about a given namespace
  • expire - recompute a given key (process or data) and all of its dependent processes

All of these methods are relative to a given namespace (root) except for command, which operates globally to all namespaces.

To just get something going, run the workflow in WCM:

wcm = gaia.load_yaml('../../resources/test/wcm/wcm.processes.yaml')
flow.merge('wcm', wcm)

You will also need to launch some sisyphus workers. To do that:

flow.launch(['sisyphus-0', 'sisyphus-1'])

Launch more if you want : ) Give each a unique key. They will deallocate after 5 minutes of inactivity.

command

Commands are the base level operations that can be run, and generally map on to command line programs invoked from a given docker container. Once defined, a command can be invoked any number of times with a new set of vars, inputs and outputs.

If you call this method with an empty array or no array, it will return all commands currently registered for the named workflow.

flow.command('biostream')
# [{'key': 'ls', 'image': 'ubuntu', ...}, ...]

All commands are in the Gaia command format and contain the following keys:

  • key - name of command
  • image - docker image containing command
  • command - array containing elements of command to be run
  • inputs - map of keys to local paths in the docker image where input files will be placed
  • outputs - map of keys to local paths where output files are expected to appear once the command has been run
  • vars - map of keys to string variables that will be provided on invocation

They may also have an optional stdout key which specifies what path to place stdout output (so that stdout can be used as one of the outputs of the command).

If this method is called with an array populated with command entries it will merge this set of commands into the global set and update any commands that may already be present, triggering the recomputation of any processes that refer to the updated command.

merge

Once some commands exist in the system you can start merging in processes in order to trigger computation. Every process refers to a command registered in the system, and defines the relevant vars, inputs and outputs to pass to the command. Inputs and outputs refer to paths in the data store, while vars are strings that are passed directly as values and can be spliced into various parts of the invocation.

Processes are partitioned by namespaces which are entirely encapsulated from one another. Each namespace represents its own data space with its own set of keys and values. Every method besides command is relative to the provided namespace, while commands are available to the entire system.

To call this method, provide a namespace key and an array of process entries:

flow.merge('biostream', [{'key': 'ls-home', 'command': 'ls', 'inputs': {...}, ...}, ...])

Each process entry has the following keys:

  • key - unique identifier for process
  • command - reference to which command in the system is being invoked
  • inputs - map of input keys defined by the command to keys in the data store where the inputs will come from
  • outputs - map of output keys from the command to keys in the data store where the output will be placed after successfully completing the command
  • vars - map of var keys to values the var will take. If this is an array it will create a process for each element in the array with the given value for the var

If this is a process with a key that hasn't been seen before, it will create the process entry and trigger the computation of outputs if the required inputs are available in the data store. If the key of the process being merged already exists in the namespace, that process will be updated and recomputed, along with all processes that depend on outputs from the updated process in that namespace.

trigger

The trigger method simply triggers the computation in the provided namespace if it is not already running:

flow.trigger('biostream')

halt

The 'halt' method is the inverse of the 'trigger' method. It will immediately cancel all running tasks and stop the computation in the given namespace:

flow.halt('biostream')

status

The status method provides information about a given namespace. There is a lot of information available, and it is partitioned into four keys:

  • state - a single string representing the state of the overall namespace. Possible values are 'initialized', 'running', 'complete', 'halted' and 'error'.
  • flow - contains a representation of the defined processes in the namespace as a bipartite graph: process and data. There are two keys, process and data which represent the two halves of this bipartite graph. Each entry has a from field containing keys it is dependent on and a to field containing all keys dependent on it.
  • data - contains a map of data keys to their current status (either missing or complete)
  • tasks - contains information about each task run through the configured executor. This will largely be executor dependent
flow.status('biostream')

expire

The expire method accepts a namespace and a list of keys of either processes or data, and recomputes each key and every process that depends on any of the given keys.

flow.expire('biostream', ['ls-home', 'genomes', ...])

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaia-0.0.5.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

gaia-0.0.5-py2-none-any.whl (5.0 kB view details)

Uploaded Python 2

File details

Details for the file gaia-0.0.5.tar.gz.

File metadata

  • Download URL: gaia-0.0.5.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/2.7.15

File hashes

Hashes for gaia-0.0.5.tar.gz
Algorithm Hash digest
SHA256 7934dfa8728c3863bef0b675fc32c694f6ecd6d8266bcd12baf934acd8f0d541
MD5 57b586dfc1a89f4e293f4240cc82a202
BLAKE2b-256 9caa54722cf2dc3c18340251bc78a528c5283999c190a95ae26530a7c104d2ac

See more details on using hashes here.

File details

Details for the file gaia-0.0.5-py2-none-any.whl.

File metadata

  • Download URL: gaia-0.0.5-py2-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/2.7.15

File hashes

Hashes for gaia-0.0.5-py2-none-any.whl
Algorithm Hash digest
SHA256 514cec311a4d332d12e2d9618e8578447ab7037d82cf4410808433237768ac3f
MD5 e573ecf942c460ee6d94000aa8611a85
BLAKE2b-256 3072a78cdd13991b09d2c568febf93dfcf75663838f7373177fbbf289603c1d0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page