Easier way to run workflows, configurable across environments

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- Science/Research
Topic
- Scientific/Engineering

Project description

Welcome to Janis-Runner

Janis. is a workflow assistant designed to make the process of building and running workflows easier.

Quick start

pip3 install janis-pipelines[runner]

You can run a workflow in CWLTool with the following command line:

janis run myWorkflow.py --engine cwltool

CLI options:

run - Run a janis workflow
watch - Watch an existing execution
abort - Issue an abort request to an existing execution
inputs - Generate an inputs file for a workflow
translate - Translate a workflow into CWL / WDL
metadata - Get the available metadata on an execution
version - Print the version of janis_runner

`run`

You can run a workflow with the run method, here's an example to run the hello world example:

janis run hello

View the help guide

positional arguments:
  workflow              Run the workflow defined in this file

optional arguments:
  -h, --help            show this help message and exit
  -n NAME, --name NAME  If you have multiple workflows in your file, you may
                        want to help Janis out to select the right workflow to
                        run
  --inputs INPUTS       File of inputs (matching the workflow) to override,
                        these inputs will take precedence over inputs declared
                        in the workflow
  -o OUTPUT_DIR, --output-dir OUTPUT_DIR
                        The output directory to which tasks are saved in,
                        defaults to $HOME.
  -e ENVIRONMENT, --environment ENVIRONMENT
                        Select a preconfigured environment (takes precendence
                        over engine and filescheme). See the list of
                        environments with `janis environment list`
  --engine {cromwell,cwltool}
                        Choose an engine to start
  -f {local,ssh}, --filescheme {local,ssh}
                        Choose the filescheme required to retrieve the output
                        files where your engine is located. By selecting SSH,
                        Janis will SCP the files using the --filescheme-ssh-
                        binding SSH shortcut.
  --filescheme-ssh-binding FILESCHEME_SSH_BINDING
                        Only valid if you've selected the ssh filescheme. (eg:
                        scp cluster:/path/to/output local/output/dir)
  --cromwell-url CROMWELL_URL
                        Location to Cromwell
  --validation-reference VALIDATION_REFERENCE
                        reference file for validation
  --validation-truth-vcf VALIDATION_TRUTH_VCF
                        truthVCF for validation
  --validation-intervals VALIDATION_INTERVALS
                        intervals to validate between
  --validation-fields VALIDATION_FIELDS [VALIDATION_FIELDS ...]
                        outputs from the workflow to validate
  --dryrun              convert workflow, and do everything except submit the
                        workflow
  --no-watch            Submit the workflow and return the task id
  --max-cores MAX_CORES
                        maximum number of cores to use when generating
                        resource overrides
  --max-memory MAX_MEMORY
                        maximum GB of memory to use when generating resource
                        overrides
  --hint-captureType {targeted,exome,chromosome,30x,90x,300x}
  --hint-engine {cromwell}

Configuration

It's possible to configure a number of attributes of janis.runner. You can provide a YAML configuration file in two ways:

CLI: --config /path/to/config.yml
Environment variable JANIS_CONFIGPATH=/path/to/config.yml
Default: $(HOME)/.janis/janis.conf - will additionally look for a config here.

Configurations aren't currently cascaded, but the intention is they will.

Options

Defaults: janis_runner/management/configuration.py

Config / DB directory: configDir: /path/to/configir/
- Second priority to environment variable: JANIS_CONFIGDIR
- Default: (HOME)/.janis/
- Database: {configDir}/janis.db - Janis global database
Execution directory: executionDir
- Second priority to environment variable: JANIS_EXCECUTIONDIR
- Default: (HOME)/janis/execution/
Search paths: searchPaths
- Will additionally add from environment variable: JANIS_SEARCHPATH
- Default: (HOME)/janis/

Engines

There are currently 2 engines that janis.runner supports:

CWLTool
Cromwell

CWLTool (default)

Due to the way CWLTool provides metadata, support for CWLTool is very basic, and limited to submitting workflows and linking the outputs. It doesn't allow you to disconnect and reconnect later. It's enough as a proof of concept and for very basic workflows.

You should include the --logDebug parmeter to see the output of CWLTool.

Cromwell

Cromwell can be run in two modes:

Connect to an existing instance (well supported) - include the --cromwell-url argument with the port to allow Janis.runner to correctly connect to this instance.
Run and manage it's own instance. When the task is started, the process_id of the started Cromwell instance is stored in the taskdb, when the task finishes execution, the process is manually stopped. You are able to disconnect from the task, but note that the Cromwell instance will be kept running until you watch the task again, it recognises that it has finished and then manually shuts it down.

Both of these options provide reporting and progress tracking due to Cromwell's extensive metadata endpoint. The TaskID (6 hex characters) is included as a label on the workflow. You can disconnect from a job and reconnect with this TaskID through the command:

janis watch $tid

A screenshot of the running the example whole genome germline pipeline (for a targeted sample) can be found below. (All engines can support this through a generalised metadata semantic (TaskMetadata), Neither CWLTool or Toil support much polling of metadata).

Extra Cromwell comments:

The TaskID is bound as a label on GCP instances (as tid, allowing you to query this information).
Janis uses the development spec of WDL, requiring Cromwell-42 or higher.
If asking Janis to start its own Cromwell instance, it requires the jar to be exported as $cromwelljar.

Environments

Environments are a combination of an Engine and a Filesystem. They save you from having to constantly specify your engine (+ parameters).

Environment information is used as a template, in which the task stores its own copy of the filesystem and engine. This was chosen as it allows a task's output to be relocated without losing workflow metadata.

Adding and deleting environments is currently UNAVAILABLE.

Actions:

List: janis environment list
Create: unavailable (proposed: janis environment create 'env' --engine 'engineId' --filescheme 'fsid')
Delete unavailable (proposed: janis environment -d 'env')

Filesystem

There is a weak concept of a filesystem for where your workflow is executed. This tool is really only developed for using the LocalFileSystem.

Supported filesystems:

LocalFileScheme
SSHFileScheme (identifier, connectionstring) - I'd recommend creating an SSH shortcut to avoid persisting personal details in database. Janis uses the connection string like so: scp connectionstring:/path/to/output /local/persist/path

Datbases

Janis stores a global SQLite database at {configDir}/janis.db of environments and task pointers (default: ~/.janis/janis.db). When a task is started, a database and workflow files are copied to a generated output folder (default: ~/janis/execution/{workflowName}/${yyyymmdd_hhMM}_{tid}/task.db).

v0.6.0

Version v0.6.0 brings new backwards-incompatible changes to the metadata structure, as well as significant changes to the Janis API.

Project details

These details have not been verified by PyPI

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Developers
- Science/Research
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.13.0

Jul 12, 2023

0.12.1

Jun 14, 2023

0.11.9

Nov 19, 2021

0.11.8

Jul 21, 2021

0.11.7

Jun 10, 2021

0.11.6

May 3, 2021

0.11.5

Mar 31, 2021

0.11.4

Jan 22, 2021

0.11.3

Jan 22, 2021

0.11.2

Jan 20, 2021

0.11.1

Jan 8, 2021

0.11.0

Dec 21, 2020

0.10.11

Nov 10, 2020

0.10.10

Nov 10, 2020

0.10.9

Nov 6, 2020

0.10.5

Sep 9, 2020

0.10.4

Sep 8, 2020

0.10.3

Sep 2, 2020

0.10.2

Aug 31, 2020

0.10.1

Aug 6, 2020

0.10.0

Jul 16, 2020

0.9.19

Jul 15, 2020

0.9.18

Jun 19, 2020

0.9.17

May 22, 2020

0.9.16

Apr 24, 2020

0.9.15

Apr 23, 2020

0.9.14

Apr 22, 2020

0.9.13

Mar 30, 2020

0.9.12

Mar 24, 2020

0.9.11

Mar 20, 2020

0.9.10

Mar 18, 2020

0.9.9

Mar 16, 2020

0.9.8

Feb 26, 2020

0.9.7

Jan 31, 2020

0.9.6

Jan 30, 2020

0.9.5

Jan 24, 2020

0.9.4

Jan 21, 2020

0.9.3

Jan 20, 2020

0.9.2

Jan 19, 2020

0.9.1

Jan 17, 2020

0.9.0

Jan 17, 2020

0.8.1

Dec 11, 2019

0.8.0

Dec 9, 2019

0.7.16

Dec 9, 2019

0.7.15

Dec 6, 2019

0.7.13

Nov 21, 2019

0.7.12

Nov 18, 2019

0.7.11

Nov 15, 2019

0.7.10

Nov 14, 2019

0.7.9

Nov 14, 2019

0.7.8

Nov 13, 2019

0.7.7

Nov 11, 2019

0.7.6

Nov 10, 2019

0.7.5

Nov 7, 2019

0.7.4

Nov 7, 2019

This version

0.7.3

Nov 6, 2019

0.7.1

Oct 25, 2019

0.7.0

Oct 25, 2019

0.6.2

Oct 2, 2019

0.6.1

Sep 26, 2019

0.6.0

Sep 26, 2019

0.5.7

Aug 22, 2019

0.5.6

Aug 15, 2019

0.5.5

Aug 12, 2019

0.5.4

Aug 7, 2019

0.5.3

Aug 6, 2019

0.5.2

Aug 1, 2019

0.5.1

Aug 1, 2019

0.5.0

Jul 30, 2019

0.4.0

Jul 26, 2019

0.1.0

Jul 23, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

janis-pipelines.runner-0.7.3.tar.gz (72.2 kB view hashes)

Uploaded Nov 6, 2019 Source

Built Distribution

janis_pipelines.runner-0.7.3-py3-none-any.whl (111.6 kB view hashes)

Uploaded Nov 6, 2019 Python 3

Hashes for janis-pipelines.runner-0.7.3.tar.gz

Hashes for janis-pipelines.runner-0.7.3.tar.gz
Algorithm	Hash digest
SHA256	`2522e09f8bbf3a32f620ea89d08cde36443bbc326aec8e0e05c7a8463f4d61f7`
MD5	`6e3dcd23f7787bbe817aaaf4d1e0b41a`
BLAKE2b-256	`f26850769d6749fda1cc3863d4163eb8d48725c44becd7c0c7cc6fd226136c62`

Hashes for janis_pipelines.runner-0.7.3-py3-none-any.whl

Hashes for janis_pipelines.runner-0.7.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cb06cee543fa32f793fd8f4199f1fd6cbf1a5e0aaee181965221b9f20fbaf832`
MD5	`a0c65049cf449c6a3814acf3bfc5d048`
BLAKE2b-256	`698e363735fcd406082ed9e9fb97c7db1f9642f67c1ffeed28aa3c9548dc257b`