Implementation of a GA4GH workflow execution service that can easily support various workflow runners.
Project description
SAPPORO-service
SAPPORO is a standard implementation conforming to the Global Alliance for Genomics and Health (GA4GH) Workflow Execution Service (WES) API specification.
One of SAPPORO's features is the abstraction of workflow engines, which makes it easy to convert various workflow engines into WES. The following workflow engines have been confirmed to be working at present.
- cwltool
- nextflow
- toil
- cromwell
- snakemake
Another feature of SAPPORO is the mode that can only execute the workflow registered by the administrator. This feature is useful when building a WES in a shared HPC environment.
Install and Run
SAPPORO supports Python 3.6 or newer.
$ pip3 install sapporo
$ sapporo
Docker
You can also launch it with Docker.
To use Docker-in-Docker (DinD), you have to mount docker.sock
, /tmp
, etc.
# Launch
$ docker-compose up -d --build
# 起動確認
$ docker-compose logs
Usage
The help for the SAPPORO startup command is as follows
$ sapporo --help
usage: sapporo [-h] [--host] [-p] [--debug] [-r] [--disable-get-runs]
[--disable-workflow-attachment]
[--run-only-registered-workflows] [--service-info]
[--executable-workflows] [--run-sh]
Implementation of a GA4GH workflow execution service that can easily
support various workflow runners.
optional arguments:
-h, --help show this help message and exit
--host Host address of Flask. (default: 127.0.0.1)
-p , --port Port of Flask. (default: 8080)
--debug Enable debug mode of Flask.
-r , --run-dir Specify the run dir. (default: ./run)
--disable-get-runs Disable endpoint of `GET /runs`.
--disable-workflow-attachment
Disable `workflow_attachment` on endpoint `Post
/runs`.
--run-only-registered-workflows
Run only registered workflows. Check the registered
workflows using `GET /service-info`, and specify
`workflow_name` in the `POST /run`.
--service-info Specify `service-info.json`. The
supported_wes_versions, system_state_counts and
workflows are overwritten in the application.
--executable-workflows
Specify `executable-workflows.json`.
--run-sh Specify `run.sh`.
Operating Mode
There are two startup modes for SAPPORO.
- Standard WES mode (Default)
- Mode to execute only registered workflows
These are switched with the -run-only-registered-workflows
argument at startup. It can also be switched by giving True
or False
to the environment variable SAPPORO_ONLY_REGISTERED_WORKFLOWS
. Startup arguments take precedence over environment variables.
Standard WES mode
As API specifications, please check GitHub - GA4GH WES and SwaggerUI - GA4GH WES.
It is different from the standard WES API specification, you must specify workflow_engine_name
in the request parameter of POST /runs
. I personally consider this to be a flaw in the standard WES API specification, so I requested a fix.
Mode to execute only registered workflows
As API specifications for a mode to execute only registered workflows, please check SwaggerUI - GA4GH WES.
Basically, it conforms to the standard WES API. The changes are as follows.
- Executable workflows are returned by
GET /service-info
asexecutable_workflows
. - Specify
workflow_name
instead ofworkflow_url
withPOST /runs
.
The following is an example of requesting GET /service-info
in a mode to execute only registered workflows.
GET /service-info
{
"auth_instructions_url": "https://github.com/ddbj/SAPPORO-service",
"contact_info_url": "https://github.com/ddbj/SAPPORO-service",
"default_workflow_engine_parameters": [],
"executable_workflows": [
{
"workflow_attachment": [],
"workflow_name": "CWL_trimming_and_qc_remote",
"workflow_type": "CWL",
"workflow_type_version": "v1.0",
"workflow_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/trimming_and_qc_remote.cwl"
},
{
"workflow_attachment": [
{
"file_name": "fastqc.cwl",
"file_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/fastqc.cwl"
},
{
"file_name": "trimming_pe.cwl",
"file_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/trimming_pe.cwl"
}
],
"workflow_name": "CWL_trimming_and_qc_local",
"workflow_type": "CWL",
"workflow_type_version": "v1.0",
"workflow_url": "https://raw.githubusercontent.com/ddbj/SAPPORO-service/master/tests/resources/trimming_and_qc.cwl"
}
],
"supported_filesystem_protocols": [
"http",
"https",
"file"
],
"supported_wes_versions": [
"sapporo-wes-1.1"
],
"system_state_counts": {},
"tags": {
"debug": true,
"get_runs": true,
"registered_only_mode": true,
"run_dir": "/home/ubuntu/git/github.com/ddbj/SAPPORO-service/run",
"wes_name": "sapporo",
"workflow_attachment": true
},
"workflow_engine_versions": {
"cromwell": "50",
"cwltool": "1.0.20191225192155",
"nextflow": "20.04.1",
"snakemake": "v5.17.0",
"toil": "4.1.0"
},
"workflow_type_versions": {
"CWL": {
"workflow_type_version": [
"v1.0",
"v1.1",
"v1.1.0-dev1"
]
}
}
}
The executable workflows are managed by executable_workflows.json
. Also, the schema for this definition is executable_workflows.schema.json
. The default location of these files is under the application dir of SAPPORO, but it can be overridden by the startup argument --executable-workflows
or the environment variable SAPPORO_EXECUTABLE_WORKFLOWS
.
Run Dir
SAPPORO manages the submitted workflows, workflow parameters, output files, etc. on the file system. The location of run dir can be overridden by the startup argument --run-dir
or the environment variable SAPPORO_RUN_DIR
.
The run dir structure is as follows. Initialization and deletion of each run can be done by physical deletion with rm
.
$ tree run
.
└── 29
└── 29109b85-7935-4e13-8773-9def402c7775
├── cmd.txt
├── end_time.txt
├── exe
│ └── workflow_params.json
├── exit_code.txt
├── outputs
│ ├── ERR034597_1.small.fq.trimmed.1P.fq
│ ├── ERR034597_1.small.fq.trimmed.1U.fq
│ ├── ERR034597_1.small.fq.trimmed.2P.fq
│ ├── ERR034597_1.small.fq.trimmed.2U.fq
│ ├── ERR034597_1.small_fastqc.html
│ └── ERR034597_2.small_fastqc.html
├── outputs.json
├── run.pid
├── run_request.json
├── start_time.txt
├── state.txt
├── stderr.log
├── stdout.log
└── workflow_engine_params.txt
├── 2d
│ └── ...
└── 6b
└── ...
The execution of POST /runs
is very complex. Examples using Python's requests are provided by GitHub - sapporo/tests/post_runs_examples. Please use this as a reference
run.sh
We use run.sh
to abstract the workflow engine. When POST /runs
is called, SAPPORO fork the execution of run.sh
after dumping the necessary files to run dir. Therefore, you can apply various workflow engines to WES by editing run.sh
.
The default position of run.sh
is under the application dir of SAPPORO, but it can be overridden by the startup argument --run-sh
or the environment variable SAPPORO_RUN_SH
.
Other Startup Arguments
The startup host or port can be changed by specifying the startup arguments (--host
and --port
). And environment variables corresponding to these arguments are SAPPORO_HOST
and SAPPORO_PORT
.
The following two startup arguments and environment variables are provided to limit the WES.
--disable-get-runs
SAPPORO_GET_RUNS
:True
orFalse
.- Disable
GET /runs
.- When using WES with an unspecified number of people, you can see the contents of the run and cancels the run by knowing the run_id of other people.
- Because run_id itself is automatically generated by
uuid4
, it is difficult to know it in brute force.
--disable-workflow-attachment
SAPPORO_WORKFLOW_ATTACHMENT
:True
orFalse
.- Disable
workflow_attachment
inPOST /runs
.- The
workflow_attachment
is the field to attach a file required for executing a workflow. - There is a security concern because anything can be attached.
- The
The contents of the response of GET /service-info
are managed by service-info.json
. The default position of service-info.json
is under the application dir of SAPPORO, but it can be overridden by the startup argument --service-info
or the environment variable SAPPORO_SERVICE_INFO
.
Development
The development environment starts with the following.
$ docker-compose -f docker-compose.dev.yml up -d --build
$ docker-compose -f docker-compose.dev.yml exec app bash
We use flake8, isort, and mypy as the Linter.
$ bash ./tests/lint_and_style_check/flake8.sh
$ bash ./tests/lint_and_style_check/isort.sh
$ bash ./tests/lint_and_style_check/mypy.sh
We use pytest as a Test Tool.
$ pytest .
License
Apache-2.0. See the LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.