Skip to main content

WDL launcher for Amazon Omics

Project description

miniwdl-omics-run

This command-line tool makes it easier to launch WDL runs on the AWS HealthOmics workflow service. It uses miniwdl locally to register WDL workflows, validate command-line inputs, and start a run.

pip3 install miniwdl-omics-run

miniwdl-omics-run \
    --role {SERVICE_ROLE_NAME} \
    --output-uri s3://{BUCKET_NAME}/{PREFIX} \
    {MAIN_WDL_FILE} input1=value1 input2=value2 ...

Quick start

Prerequisites: Unix command line with Python & pip; up-to-date AWS CLI installed locally, and configured with full access to your AWS account.

First a few one-time account setup steps (S3 bucket, IAM service role, ECR repo), then launching a test workflow.

S3 bucket

Create an S3 bucket with a test input file.

AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
AWS_DEFAULT_REGION=$(aws configure get region)

aws s3 mb --region "$AWS_DEFAULT_REGION" "s3://${AWS_ACCOUNT_ID}-${AWS_DEFAULT_REGION}-omics"
echo test | aws s3 cp - s3://${AWS_ACCOUNT_ID}-${AWS_DEFAULT_REGION}-omics/test/test.txt

Service role

Create an IAM service role for your Omics workflow runs to use (to access S3, ECR, etc.).

aws iam create-role --role-name poweromics --assume-role-policy-document '{
    "Version":"2012-10-17",
    "Statement":[{
        "Effect":"Allow",
        "Action":"sts:AssumeRole",
        "Principal":{"Service":"omics.amazonaws.com"}
    }]
}'

aws iam attach-role-policy --role-name poweromics \
    --policy-arn arn:aws:iam::aws:policy/PowerUserAccess

WARNING: PowerUserAccess, suggested here only for brevity, is far more powerful than needed. See Omics docs on service roles for the least privileges necessary, especially if you plan to use third-party WDL and/or Docker images.

ECR repository

Create an ECR repository suitable for Omics to pull Docker images from.

aws ecr create-repository --repository-name omics
aws ecr set-repository-policy --repository-name omics --policy-text '{
    "Version": "2012-10-17",
    "Statement": [{
        "Sid": "omics workflow",
        "Effect": "Allow",
        "Principal": {"Service": "omics.amazonaws.com"},
        "Action": [
            "ecr:GetDownloadUrlForLayer",
            "ecr:BatchGetImage",
            "ecr:BatchCheckLayerAvailability"
        ]
    }]
}'

Push a plain Ubuntu image to the repository.

ECR_ENDPT="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com"
aws ecr get-login-password | docker login --username AWS --password-stdin "$ECR_ENDPT"

docker pull --platform linux/amd64 ubuntu:22.04
docker tag ubuntu:22.04 "${ECR_ENDPT}/omics:ubuntu-22.04"
docker push "${ECR_ENDPT}/omics:ubuntu-22.04"

Run test workflow

pip3 install miniwdl-omics-run
wget https://raw.githubusercontent.com/miniwdl-ext/miniwdl-omics-run/main/test/TestFlow.wdl

miniwdl-omics-run TestFlow.wdl \
    input_txt_file="s3://${AWS_ACCOUNT_ID}-${AWS_DEFAULT_REGION}-omics/test/test.txt" \
    docker="${AWS_ACCOUNT_ID}.dkr.ecr.${AWS_DEFAULT_REGION}.amazonaws.com/omics:ubuntu-22.04" \
    --role poweromics --output-uri "s3://${AWS_ACCOUNT_ID}-${AWS_DEFAULT_REGION}-omics/test/out"

This zips up the specified WDL, registers it as an Omics workflow, validates the given inputs, and starts the workflow run.

The WDL source code may be set to a local filename or a public HTTP(S) URL. The tool automatically bundles any WDL files imported by the main one. On subsequent invocations, it'll reuse the previously-registered workflow if the source code hasn't changed, or add a new workflow version if it has.

The command-line interface accepts WDL inputs using the input_key=value syntax exactly like miniwdl run, including the option of a JSON file with --input FILE.json. Each input File must be set to an existing S3 URI accessible by the service role.

Advice

  • Omics can use Docker images only from your ECR in the same account & region.
    • This often means pulling, re-tagging, and pushing images as illustrated above with ubuntu:22.04.
    • And editing any WDL tasks that hard-code docker image tags to take them as inputs instead.
    • Each ECR repository must have the Omics-specific repository policy set as shown above.
    • We therefore tend to use a single ECR repository for multiple Docker images, disambiguating them using lengthier tags.
    • If you prefer to use per-image repositories, just remember to set the repository policy on each one.
  • To quickly list a workflow's inputs, try miniwdl run workflow.wdl ?
  • To use call caching, create a run cache using the console or CLI and pass --cache {NAME} or --cache-id {ID} to miniwdl-omics-run.
  • To use dynamic run storage, pass --storage-type dynamic.
  • Omics has certain limits on the number of runs and versions per workflow; if you hit those, then you'll need to use the Console/CLI/API to clear them out.
  • Before Omics had workflow versioning, this tool created a separate Omics workflow for any change to the WDL, each named with a content digest suffix. That behavior can be restored with --legacy-workflow-name.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

miniwdl_omics_run-0.7.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

miniwdl_omics_run-0.7.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file miniwdl_omics_run-0.7.0.tar.gz.

File metadata

  • Download URL: miniwdl_omics_run-0.7.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for miniwdl_omics_run-0.7.0.tar.gz
Algorithm Hash digest
SHA256 8aae0994385ca8091f072067fea67536358b1dd635580f0ab085f202fa5f613e
MD5 cbae6a46d2c27b08d69f373da80cd7bb
BLAKE2b-256 583624d3fa10a2032aceec4be62ac399abc3d48a12a04240b3b2f973462625de

See more details on using hashes here.

File details

Details for the file miniwdl_omics_run-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for miniwdl_omics_run-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb952287ae78304ef86a09f88cdac3cd4961e734a53ac79bc37bc579898fd658
MD5 627e274d0ee25e9b9ce19c21b597d62a
BLAKE2b-256 8f05c21ce3f3cd0a87212a0bbeb83c0bb8db487af8ffb94e33155e3693d730ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page