Cromwell Assisted Pipeline ExecutoR

These details have not been verified by PyPI

Project links

Homepage

Project description

Caper

Caper (Cromwell Assisted Pipeline ExecutoR) is a wrapper Python package for Cromwell.

Introduction

Caper is based on Unix and cloud platform CLIs (curl, gsutil and aws) and provides easier way of running Cromwell server/run modes by automatically composing necessary input files for Cromwell. Also, Caper supports easy automatic file transfer between local/cloud storages (local path, s3://, gs:// and http(s)://). You can use these URIs in input JSON file or for a WDL file itself.

Features

Similar CLI: Caper has a similar CLI as Cromwell.
Built-in backends: You don't need your own backend configuration file. Caper provides built-in backends.
Automatic transfer between local/cloud storages: You can use URIs (e.g. gs://, http:// and s3://) instead of paths in a command line arguments, also in your input JSON file. Files associated with these URIs will be automatically transfered to a specified temporary directory on a target remote storage.
Deepcopy for input JSON file: Recursively copy all data files in (.json, .tsv and .csv) to a target remote storage.
Docker/Singularity integration: You can run a WDL workflow in a specifed docker/singularity container.
MySQL database integration: We provide shell scripts to run a MySQL database server in a docker/singularity container. Using Caper with MySQL database will allow you to use Cromwell's call-caching to re-use outputs from previous successful tasks. This will be useful to resume a failed workflow where it left off.
One configuration file for all: You may not want to repeat writing same command line parameters for every pipeline run. Define parameters in a configuration file at ~/.caper/default.conf.
One server for six backends: Built-in backends allow you to submit pipelines to any local/remote backend specified with -b or --backend.
Cluster engine support: SLURM, SGE and PBS are currently supported locally.
Easy workflow management: Find all workflows submitted to a Cromwell server by workflow IDs (UUIDs) or str_label (special label for a workflow submitted by Caper submit and run). You can define multiple keywords with wildcards (* and ?) to search for matching workflows. Abort, release hold, retrieve metadata JSON for them.
Automatic subworkflow packing: Caper automatically creates an archive (imports.zip) of all imports and send it to Cromwell server/run.
Special label (str_label): You have a string label, specified with -s or --str-label, for your workflow so that you can search for your workflow by this label instead of Cromwell's workflow UUID (e.g. f12526cb-7ed8-4bfa-8e2e-a463e94a61d0).

Installation

Make sure that you have python3(> 3.4.1) installed on your system. Use pip to install Caper.

$ pip install caper

Or git clone this repo and manually add bin/ to your environment variable PATH in your BASH startup scripts (~/.bashrc).

$ git clone https://github.com/ENCODE-DCC/caper
$ echo "export PATH=\"\$PATH:$PWD/caper/bin\"" >> ~/.bashrc

Usage

There are 7 subcommands available for Caper. Except for run other subcommands work with a running Cromwell server, which can be started with server subcommand. server does not require a positional argument. WF_ID (workflow ID) is a UUID generated from Cromwell to identify a workflow. STR_LABEL is Caper's special string label to be used to identify a workflow.

Subcommand	Positional args	Description
server		Run a Cromwell server with built-in backends
run	WDL	Run a single workflow
submit	WDL	Submit a workflow to a Cromwell server
abort	WF_ID or STR_LABEL	Abort submitted workflows on a Cromwell server
unhold	WF_ID or STR_LABEL	Release hold of workflows on a Cromwell server
list	WF_ID or STR_LABEL	List submitted workflows on a Cromwell server
metadata	WF_ID or STR_LABEL	Retrieve metadata JSONs for workflows

Examples:

run: To run a single workflow. Add --hold to put an hold to submitted workflows.
```
 $ caper run [WDL] -i [INPUT_JSON]
```
server: To start a server
```
 $ caper server
```
submit: To submit a workflow to a server. -s is optional but useful for other subcommands to find submitted workflow with matching string label.
```
 $ caper submit [WDL] -i [INPUT_JSON] -s [STR_LABEL]
```
list: To show list of all workflows submitted to a cromwell server. Wildcard search with using * and ? is allowed for such label for the following subcommands with STR_LABEL.
```
 $ caper list [WF_ID or STR_LABEL]
```
Other subcommands: Other subcommands work similar to list. It does a corresponding action for matched workflows.

Configuration file

Caper automatically creates a default configuration file at ~/.caper/default.conf. Such configruation file comes with all available parameters commented out. You can uncomment/define any parameter to activate it.

You can avoid repeatedly defining same parameters in your command line arguments by using a configuration file. For example, you can define out_dir and tmp_dir in your configuration file instead of defining them in command line arguments.

$ caper run [WDL] --out-dir [LOCAL_OUT_DIR] --tmp-dir [LOCAL_TMP_DIR]

Equivalent settings in a configuration file.

[defaults]

out-dir=[LOCAL_OUT_DIR]
tmp-dir=[LOCAL_TMP_DIR]

Before running it

Run Caper to generate a default configuration file.

$ caper

How to run it on a local computer

Define two important parameters in your default configuration JSON file (~/.caper/default.json).

# directory to store all outputs
out-dir=[LOCAL_OUT_DIR]

# temporary directory for Caper
# lots of temporary files will be created and stored here
# e.g. backend.conf, workflow_opts.json, input.json, labels.json
# don't use /tmp
tmp-dir=[LOCAL_TMP_DIR]

Run Caper. --deepcopy is optional for remote (http://, gs://, s3://, ...) INPUT_JSON file.

$ caper run [WDL] -i [INPUT_JSON] --deepcopy

How to run it on Google Cloud Platform (GCP)

Install gsutil. Configure for gcloud and gsutil.

Define three important parameters in your default configuration JSON file (~/.caper/default.json).

# your project name on Google Cloud platform
gcp-project=YOUR_PRJ_NAME

# directory to store all outputs
out-gcs-bucket=gs://YOUR_OUTPUT_ROOT_BUCKET/ANY/WHERE

# temporary bucket directory for Caper
tmp-gcs-bucket=gs://YOUR_TEMP_BUCKET/SOME/WHERE

Run Caper. --deepcopy is optional for remote (local, http://, s3://, ...) INPUT_JSON file.

$ caper run [WDL] -i [INPUT_JSON] --backend gcp --deepcopy

How to run it on AWS

Install AWS CLI. Configure for AWS.

Define three important parameters in your default configuration JSON file (~/.caper/default.json).

# ARN for your AWS Batch
aws-batch-arn=ARN_FOR_YOUR_AWS_BATCH

# directory to store all outputs
out-s3-bucket=s3://YOUR_OUTPUT_ROOT_BUCKET/ANY/WHERE

# temporary bucket directory for Caper
tmp-s3-bucket=s3://YOUR_TEMP_BUCKET/SOME/WHERE

Run Caper. --deepcopy is optional for remote (http://, gs://, local, ...) INPUT_JSON file.

$ caper run [WDL] -i [INPUT_JSON] --backend aws --deepcopy

How to run it on SLURM cluster

Define five important parameters in your default configuration JSON file (~/.caper/default.json).

# directory to store all outputs
out-dir=[LOCAL_OUT_DIR]

# temporary directory for Caper
# lots of temporary files will be created and stored here
# e.g. backend.conf, workflow_opts.json, input.json, labels.json
# don't use /tmp
tmp-dir=[LOCAL_TMP_DIR]

# SLURM partition if required (e.g. on Stanford Sherlock)
slurm-partition=YOUR_PARTITION

# SLURM account if required (e.g. on Stanford SCG4)
slurm-account=YOUR_ACCOUMT

# You may not need to specify the above two
# since most SLURM clusters have default rules for partition/account

# server mode
# port is 8000 by default. but if it's already taken 
# then try other ports like 8001
port=8000

Run Caper. --deepcopy is optional for remote (http://, gs://, s3://, ...) INPUT_JSON file.

$ caper run [WDL] -i [INPUT_JSON] --backend slurm --deepcopy

Or run a Cromwell server with Caper. Make sure to keep server's SSH session alive. If there is any conflicting port. Change port in your default configuration JSON file.

$ caper server

On HPC cluster with Singularity installed, run Caper with a Singularity container if that is defined inside WDL.

$ caper run [WDL] -i [INPUT_JSON] --backend slurm --deepcopy --use-singularity

Or specify your own Singularity container.

$ caper run [WDL] -i [INPUT_JSON] --backend slurm --deepcopy --singularity [YOUR_SINGULARITY_IMAGE]

Then submit pipelines to the server.

$ caper submit [WDL] -i [INPUT_JSON] --deepcopy -p [PORT]

How to run it on SGE cluster

Define four important parameters in your default configuration JSON file (~/.caper/default.json).

# directory to store all outputs
out-dir=[LOCAL_OUT_DIR]

# temporary directory for Caper
# lots of temporary files will be created and stored here
# e.g. backend.conf, workflow_opts.json, input.json, labels.json
# don't use /tmp
tmp-dir=[LOCAL_TMP_DIR]

# SGE PE
sge-pe=YOUR_PARALLEL_ENVIRONMENT

# server mode
# port is 8000 by default. but if it's already taken 
# then try other ports like 8001
port=8000

Run Caper. --deepcopy is optional for remote (http://, gs://, s3://, ...) INPUT_JSON file.

$ caper run [WDL] -i [INPUT_JSON] --backend sge --deepcopy

Or run a Cromwell server with Caper. Make sure to keep server's SSH session alive. If there is any conflicting port. Change port in your default configuration JSON file.

$ caper server

Then submit pipelines to the server.

$ caper submit [WDL] -i [INPUT_JSON] --deepcopy -p [PORT]

How to resume a failed workflow

You need to set up a [MySQL database server](DETAILS.md/#MySQL server) to use Cromwell's call-caching feature, which allows a failed workflow to start from where it left off. Use the same command line that you used to start a workflow then Caper will automatically skip tasks that are already done successfully.

Make sure you have Docker or Singularity installed on your system. Singularity does not require super-user privilege to be installed.

Configure for MySQL DB in a default configuration file ~/.caper/default.conf.

# MySQL DB port
# try other port if already taken
mysql-db-port=3307

DB_DIR is a directory to be used as a DB storage. Create an empty directory if it's for the first time. DB_PORT is a MySQL DB port. If there is any conflict use other ports.

Docker

$ run_mysql_server_docker.sh [DB_DIR] [DB_PORT]

Singularity

$ run_mysql_server_singularity.sh [DB_DIR] [DB_PORT]

Using Conda?

Just activate your CONDA_ENV before running Caper (both for run and server modes).

$ conda activate [COND_ENV]

DETAILS

See details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

2.3.2

Jul 21, 2023

2.3.1

May 4, 2023

2.2.3

Oct 24, 2022

2.2.2

Jul 26, 2022

2.2.1

Jul 21, 2022

2.2.0

Jun 13, 2022

2.1.3

Feb 7, 2022

2.1.2

Dec 8, 2021

2.1.1

Nov 20, 2021

2.1.0

Nov 11, 2021

2.0.0

Oct 27, 2021

1.6.3

Jun 4, 2021

1.6.2

May 18, 2021

1.6.1

May 6, 2021

1.6.0

May 4, 2021

1.5.1

Apr 8, 2021

1.5.0

Apr 2, 2021

1.4.2

Nov 4, 2020

1.4.1

Oct 13, 2020

1.4.0

Oct 6, 2020

1.3.3

Sep 22, 2020

1.3.2

Sep 17, 2020

1.3.1

Sep 15, 2020

1.3.0

Sep 2, 2020

1.2.0

Aug 11, 2020

1.1.0

Jul 28, 2020

1.0.0

Jul 21, 2020

0.8.2.1

Apr 15, 2020

0.8.2

Apr 6, 2020

0.8.1.1

Apr 3, 2020

0.8.0

Mar 31, 2020

0.7.0

Mar 10, 2020

0.6.4

Feb 12, 2020

0.6.3

Dec 21, 2019

0.6.2

Dec 9, 2019

0.6.1

Nov 15, 2019

0.6.0

Nov 7, 2019

0.5.6

Nov 6, 2019

0.5.5

Nov 5, 2019

0.5.4

Nov 2, 2019

0.5.2

Oct 31, 2019

0.5.1

Oct 25, 2019

0.5.0

Oct 25, 2019

0.4.1

Oct 10, 2019

0.4.0

Oct 1, 2019

0.3.20

Aug 30, 2019

0.3.19

Aug 27, 2019

0.3.18

Aug 27, 2019

0.3.17

Aug 20, 2019

0.3.15

Jul 16, 2019

0.3.14

Jul 4, 2019

0.3.13

Jul 4, 2019

0.3.12

Jul 4, 2019

0.3.11

Jul 4, 2019

0.3.10

Jul 4, 2019

0.3.9

Jun 25, 2019

0.3.8

Jun 25, 2019

0.3.7

Jun 22, 2019

0.3.6

Jun 16, 2019

0.3.5

Jun 14, 2019

0.3.4

Jun 14, 2019

0.3.3

Jun 14, 2019

0.3.2

Jun 13, 2019

0.3.1

Jun 13, 2019

0.3.0

Jun 12, 2019

0.2.8

Jun 12, 2019

0.2.7

Jun 11, 2019

0.2.6

Jun 10, 2019

0.2.5

Jun 10, 2019

0.2.4

Jun 7, 2019

0.2.3.4

Jun 6, 2019

0.2.3.3

Jun 6, 2019

0.2.3.2

Jun 6, 2019

0.2.3

Jun 6, 2019

0.2.2

Jun 4, 2019

0.2.1.1

Jun 1, 2019

0.2.1

Jun 1, 2019

0.2.0.1

May 29, 2019

0.2.0

May 29, 2019

0.1.9

May 24, 2019

0.1.8

May 24, 2019

This version

0.1.7

May 24, 2019

0.1.6

May 24, 2019

0.1.5

May 23, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

caper-0.1.7-py3-none-any.whl (32.4 kB view hashes)

Uploaded May 24, 2019 Python 3

Hashes for caper-0.1.7-py3-none-any.whl

Hashes for caper-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`09eb3eab5f083fcb31d382d3d6a5b834c4222668a11f1c59b58b4ece37b90d2a`
MD5	`36d7bbbff0fed57b7cebf9eaff2e7c97`
BLAKE2b-256	`6db107e344babe7bcf917b03622ac823bb5d7adbbe4ffa30672eb32e44efb009`