No project description provided

Project description

gaia python client

running

To connect to a running Gaia instance, find the host and do the following:

import gaia
config = {
    'gaia_host': '10.138.0.21:24442',
    'kafka_host': '10.138.0.2:9092'}
flow = gaia.Gaia(config)

Now that we have a reference to the client, there are several methods we can call.

command - see what commands are available and add new commands
merge - update or add new processes into the given namespace
trigger - recompute dependencies and launch outstanding processes in a namespace
halt - stop a running namespace
status - find out all information about a given namespace
expire - recompute a given key (process or data) and all of its dependent processes

All of these methods are relative to a given namespace (root) except for command, which operates globally to all namespaces.

To just get something going, run the workflow in WCM:

wcm = gaia.load_yaml('../../resources/test/wcm/wcm.processes.yaml')
flow.merge('wcm', wcm)

You will also need to launch some sisyphus workers. To do that:

flow.launch('worker-a')

Launch more if you want : ) Give each a unique key. They will deallocate after 5 minutes of inactivity.

Once your workflow is running, you can listen to the logs as they appear:

flow.listen()

command

Commands are the base level operations that can be run, and generally map on to command line programs invoked from a given docker container. Once defined, a command can be invoked any number of times with a new set of vars, inputs and outputs.

If you call this method with an empty array, it will return all commands currently registered in the system.

flow.command([])
# [{'key': 'ls', 'image': 'ubuntu', ...}, ...]

All commands are in the Gaia command format and contain the following keys:

key - name of command
image - docker image containing command
command - array containing elements of command to be run
inputs - map of keys to local paths in the docker image where input files will be placed
outputs - map of keys to local paths where output files are expected to appear once the command has been run
vars - map of keys to string variables that will be provided on invocation

They may also have an optional stdout key which specifies what path to place stdout output (so that stdout can be used as one of the outputs of the command).

If this method is called with an array populated with command entries it will merge this set of commands into the global set and update any commands that may already be present, triggering the recomputation of any processes that refer to the updated command.

merge

Once some commands exist in the system you can start merging in processes in order to trigger computation. Every process refers to a command registered in the system, and defines the relevant vars, inputs and outputs to pass to the command. Inputs and outputs refer to paths in the data store, while vars are strings that are passed directly as values and can be spliced into various parts of the invocation.

Processes are partitioned by namespaces which are entirely encapsulated from one another. Each namespace represents its own data space with its own set of keys and values. Every method besides command is relative to the provided namespace, while commands are available to the entire system.

To call this method, provide a namespace key and an array of process entries:

flow.merge('biostream', [{'key': 'ls-home', 'command': 'ls', 'inputs': {...}, ...}, ...])

Each process entry has the following keys:

key - unique identifier for process
command - reference to which command in the system is being invoked
inputs - map of input keys defined by the command to keys in the data store where the inputs will come from
outputs - map of output keys from the command to keys in the data store where the output will be placed after successfully completing the command
vars - map of var keys to values the var will take. If this is an array it will create a process for each element in the array with the given value for the var

If this is a process with a key that hasn't been seen before, it will create the process entry and trigger the computation of outputs if the required inputs are available in the data store. If the key of the process being merged already exists in the namespace, that process will be updated and recomputed, along with all processes that depend on outputs from the updated process in that namespace.

trigger

The trigger method simply triggers the computation in the provided namespace if it is not already running:

flow.trigger('biostream')

halt

The 'halt' method is the inverse of the 'trigger' method. It will immediately cancel all running tasks and stop the computation in the given namespace:

flow.halt('biostream')

status

The status method provides information about a given namespace. There is a lot of information available, and it is partitioned into four keys:

state - a single string representing the state of the overall namespace. Possible values are 'initialized', 'running', 'complete', 'halted' and 'error'.
flow - contains a representation of the defined processes in the namespace as a bipartite graph: process and data. There are two keys, process and data which represent the two halves of this bipartite graph. Each entry has a from field containing keys it is dependent on and a to field containing all keys dependent on it.
data - contains a map of data keys to their current status (either missing or complete)
tasks - contains information about each task run through the configured executor. This will largely be executor dependent

flow.status('biostream')

expire

The expire method accepts a namespace and a list of keys of either processes or data, and recomputes each key and every process that depends on any of the given keys.

flow.expire('biostream', ['ls-home', 'genomes', ...])

Project details

Release history Release notifications | RSS feed

0.0.9

Dec 9, 2019

0.0.7

Jul 24, 2019

0.0.5

Jul 19, 2019

0.0.4

Jul 18, 2019

This version

0.0.2

Jul 11, 2019

0.0.1

Jul 2, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaia-0.0.2.tar.gz (5.4 kB view details)

Uploaded Jul 11, 2019 Source

Built Distribution

gaia-0.0.2-py2-none-any.whl (5.6 kB view details)

Uploaded Jul 11, 2019 Python 2

File details

Details for the file gaia-0.0.2.tar.gz.

File metadata

Download URL: gaia-0.0.2.tar.gz
Upload date: Jul 11, 2019
Size: 5.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/2.7.15

File hashes

Hashes for gaia-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`4e1d2ba7829ac511efea24dce85919ccbdee9380b3f097061268f97b030e7094`
MD5	`1dfc68ea6a3311ce37fcc8fce55961a9`
BLAKE2b-256	`bd00dcfb448ff15bb9012afd6868c82570ca716677d86159fec9bc4615c3cada`

See more details on using hashes here.

File details

Details for the file gaia-0.0.2-py2-none-any.whl.

File metadata

Download URL: gaia-0.0.2-py2-none-any.whl
Upload date: Jul 11, 2019
Size: 5.6 kB
Tags: Python 2
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.8.0 tqdm/4.23.0 CPython/2.7.15

File hashes

Hashes for gaia-0.0.2-py2-none-any.whl
Algorithm	Hash digest
SHA256	`773490acb7209699d036fb125d1a4ec8ed37e83bbdcac14155ca9bbee7e377bb`
MD5	`279843c9e85ca0968ed78c2a07937001`
BLAKE2b-256	`f8cad6110896803ea2417adcffcfed18d510a5ddfefae510f16410b04025da8f`