Orchestrate CWL with Prefect
Project description
Prefect CWL
A lightweight adapter that bridges the Common Workflow Language (CWL) world with Prefect. It not only executes CWL but lets you orchestrate it with Prefect’s scheduling, retries, observability, and deployments. Execution is pluggable via backends, with both Docker and Kubernetes available.
In this library, the atomic unit is a single CWL step (a CommandLineTool or workflow step), not an entire workflow/flow.
Prefect orchestrates those steps according to the CWL-defined dependencies.
What this achieves
- Bridge CWL and Prefect: Parse CWL, build a dependency graph, and run steps under Prefect orchestration.
- Orchestrate, not just execute: Use Prefect’s UI, scheduling, retries, mapping, and deployments to operate CWL workloads.
- Pluggable execution backends: Run each CWL step via Docker or Kubernetes.
Key concepts
- Atomic unit = CWL step: Each CWL step is executed as a Prefect task invocation via a backend. Prefect orchestrates the order and parallelism.
- Dependency “waves”: Steps run in parallel when their dependencies are satisfied; no artificial serialization.
- Typed IR: CWL is parsed into a typed internal representation that drives orchestration and I/O wiring.
Features
- Parse a practical subset of CWL v1.2 (tools, workflows, requirements, inputs/outputs).
- Build a dependency graph and infer parallel “waves”.
- Generate a Prefect flow whose signature mirrors CWL workflow inputs.
- Execute steps via a backend that handles containers, arguments, volumes, and exit codes.
- Initial backend: Docker and Kubernetes.
Current limitations
- The adapter needs explicit working directory setup (
WRK_DIRor backend-specific mount roots). - This is still a pragmatic CWL subset, not full conformance.
- Scatter support is partial: single-input scatter is supported; multi-input dotproduct/crossproduct is not.
- Data exchange between steps is filesystem-based (host paths/PVC), not in-memory streaming. A more in-depth list can be checked out inside the DESIGN file.
Check sample_cwl folder for those limits in practice.
Backends
- Docker backend: Uses Prefect’s Docker primitives to pull images, mount volumes, and execute commands.
- Kubernetes backend: Same interface; schedules Jobs to run each CWL step.
Output collection behavior
- Output
globvalues are resolved at runtime (including wildcard patterns and interpolated values). - Relative globs are required; absolute paths and
..traversal are rejected. - Scalar outputs (
File/Directory) must match exactly one artifact unless optional (?). - Array outputs (
File[]/Directory[]) collect all matches in stable sorted order. - Collected artifacts are propagated to downstream steps as
PathorList[Path]values.
Quick start
After installing all the requirements, start Prefect Server first:
prefect server start
Then, create a new project:
mkdir this-is-just-the-client-callign
uv init
and install the library (with the uv CLI and Docker or K8s backend or both):
uv add "prefect-cwl[docker]"
uv add "prefect-cwl[k8s]"
from your shell:
from prefect_cwl import create_flow_with_docker_backend
with open("myflow.cwl") as inp:
runnable_flow = create_flow_with_docker_backend(
inp.read(), Path("/tmp"), workflow_id="#flow_id"
)
asyncio.run(runnable_flow(**inputs))
The runnable_flow is a Prefect flow that can be scheduled, deployed, and run as any other Prefect flow.
Shall you want to use K8s backend, special requirements apply:
- a running K8s cluster
- a PVC installed and deployed and usable by Prefect
- the following environment env vars set, if needed:
KUBECONFIG, for custom configurationPREFECT_CWL_K8S_NAMESPACE, for custom namespace (default:prefect)PREFECT_CWL_K8S_PVC_NAME, for custom PVC name (default:prefect-shared-pvc)PREFECT_CWL_K8S_PVC_MOUNT_PATH, for custom PVC mount path (default:/data)PREFECT_CWL_K8S_SERVICE_ACCOUNT_NAME, for custom service account name (default:prefect-flow-runner)PREFECT_CWL_K8S_PULL_SECRETS, for custom pull secrets (default:[])PREFECT_CWL_K8S_LOG_LEVEL, for step summary log level (default:INFO)PREFECT_CWL_K8S_STREAM_LOG_LEVEL, for streamed job output log level (default:DEBUG)
K8s precedence note (deployed runs):
- Merge order for supported keys is:
- Prefect base job-template defaults (including
variables.properties.*.default, when a template is available) - runtime
flow_run.job_variables - local/backend overrides (constructor args,
PREFECT_CWL_K8S_*, and optionaljob_variablespassed toK8sBackend)
- Prefect base job-template defaults (including
- Important: local/backend explicit overrides always win when present; fallback defaults are used only when a value is not provided by template/runtime/explicit override.
- Supported merged fields include:
namespace,service_account_name,env,volumes,volume_mounts/volumeMounts,image_pull_secrets,labels,finished_job_ttl,image_pull_policy
PREFECT_CWL_K8S_PVC_NAMEandPREFECT_CWL_K8S_PVC_MOUNT_PATHare always enforced byprefect-cwlfor its required work volume/mount.PREFECT_CWL_K8S_PVC_MOUNT_PATHis the authoritative in-container root used byprefect-cwlto create per-run data directories in the shared PVC.- For deployed runs, you can control
prefect-cwllog verbosity at deployment level by setting those env vars in worker/work-pooljob_variables.env.
For running a local K8s cluster, configured with Prefect and all the above requirements, check the prefect-k8s-demo folder.
Runtime concurrency controls
prefect-cwl supports CWL scatter with runtime guardrails:
PREFECT_CWL_SCATTER_CONCURRENCY(default:4): local in-flow throttling for submitted scattered runs. Set0or negative to disable this local gate.PREFECT_CWL_SCATTER_TAG(default:prefect-cwl-scatter): Prefect tag attached torun_steptask submissions. Set to empty to disable tag attachment.
To enforce a hard orchestration limit, create a Prefect concurrency limit on that tag:
prefect concurrency-limit create prefect-cwl-scatter 8
prefect concurrency-limit inspect prefect-cwl-scatter
Notes:
- The Prefect tag limit is the hard control across workers/flows using the same API/server.
- The local
PREFECT_CWL_SCATTER_CONCURRENCYgate is process-local and complements (does not replace) Prefect tag limits. - With current implementation,
PREFECT_CWL_SCATTER_TAGis applied to allrun_steptask submissions, including non-scattered runs.
Install the library locally
Prerequisite: install uv (https://github.com/astral-sh/uv). Once uv has been installed successfully, move in the project folder and use:
uv sync --all-extras --group dev
Be sure to set the PYTHONPATH variable to prefect_cwl directory.
Alternatively, use the command echo PYTHONPATH=$PWD, to set the path pointing to the current folder.
Otherwise, install it into editable mode. Should you run tests, install dev dependencies.
Start the Prefect server using the command:
uv run prefect server start
Now we can run the python script using the command:
uv run <file_path>
Sample CWL (WIP)
See sample_cwl/ for ready-to-run examples you can use to test the library. These are work-in-progress and may evolve as the adapter expands CWL coverage and features.
This includes sample_cwl/nbr/, currently added as a working subset for next-iteration refinement.
Test selection
- Sample-flow end-to-end tests are marked with
e2e(tests/test_sample_cwl_e2e.py). - Heavy remote I/O/network cases are additionally marked with
heavy_io. - Run default non-E2E tests:
uv run --group dev python -m pytest -q -m "not e2e"
- Run the full suite (unit/integration + e2e):
uv run --group dev python -m pytest -q -m "e2e or not e2e"
- Enable heavy I/O e2e cases:
PREFECT_CWL_E2E_HEAVY_IO=1 uv run --group dev python -m pytest -q -m "e2e or not e2e"
(PREFECT_CWL_E2E_NETWORK=1is also supported as an alias.)
Project status
Early-stage and evolving. Expect changes in models, supported CWL features, and backend interfaces as we harden the adapter.
Design
The package design is detailed in DESIGN.md and reflects the latest codebase, including planning vs execution for Docker and Kubernetes backends.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file prefect_cwl-0.2.0.tar.gz.
File metadata
- Download URL: prefect_cwl-0.2.0.tar.gz
- Upload date:
- Size: 53.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
04a109ee263fa7f28a8cd030045113c909b4d247b7260ff94c40e784adbbf009
|
|
| MD5 |
5254d3b00d497b345621abe74e02addd
|
|
| BLAKE2b-256 |
d5ce417af7701152b0037c9a929e7110ba42e55506479c22e8a893b7cb79e859
|
File details
Details for the file prefect_cwl-0.2.0-py3-none-any.whl.
File metadata
- Download URL: prefect_cwl-0.2.0-py3-none-any.whl
- Upload date:
- Size: 38.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f9295d21b5b60ae65ac4f856176db8b876fc08250da20583cd1686b248af54d
|
|
| MD5 |
86a8e43c496c8fb96503b3d387a529b6
|
|
| BLAKE2b-256 |
69ecbf277f3be02d071fa657499b09de6e3b983ddd767d627891499f76b05ac4
|