Skip to main content

Universal workflow-format converter built around a loss-preserving intermediate representation

Project description

wf2wf – Universal Workflow-Format Converter

Python CI Docs PyPI codecov License Workflow Schema Loss Schema

wf2wf is a format-agnostic converter: any supported engine → Intermediate Representation (IR) → any other engine. The core library handles: 

Snakemake • Nextflow • CWL • HTCondor/DAGMan • WDL • Galaxy • bundled BioCompute Objects

graph TD;
  A[Snakemake] -->|import| IR((IR));
  B[DAGMan]   --> IR;
  C[CWL]      --> IR;
  D[Nextflow] --> IR;
  E[WDL]      --> IR;
  F[Galaxy]   --> IR;
  IR -->|export| GA2[Galaxy];
  IR --> SMK2[Snakemake];
  IR --> CWL2[CWL];
  IR --> DAG2[DAGMan];
  IR --> NF2[Nextflow];
  IR --> WDL2[WDL];

wf2wf is called from the command line as:

# Convert Snakemake → DAGMan and auto-generate Markdown report
wf2wf convert -i pipeline.smk -o pipeline.dag --auto-env build --interactive --report-md

📋 Table of Contents


✨ Features

  • 🔄 Universal Conversion – Any supported engine → IR → any other engine with a single command.
  • 🧬 Loss-Mapping & Round-Trip Fidelity – Structured loss reports (*.loss.json) and automatic reinjection guarantee nothing disappears silently.
  • 🐳 Automated Environment Builds – Optional Conda-to-OCI pipeline (micromamba → conda-pack → buildx/buildah) with digest-pinned image references, SBOMs and Apptainer conversion.
  • ⚖ Regulatory & Provenance Support – BioCompute Object generation, schema validation and side-car provenance for FDA submissions.
  • 🧪 Aiming for Quality – high test coverage, semantic versioning, graceful degradation when optional external tools are missing.
  • 🔧 Smart Configuration Analysis – Automatic detection and warnings for missing resource requirements, containers, error handling, and file transfer modes when converting between shared filesystem and distributed computing workflows.
  • 💬 Interactive Mode – Guided prompts to help users address configuration gaps and optimize workflows for target execution environments.

Information-loss workflow

wf2wf should record every field the target engine cannot express:

⚠ Conversion losses: 2 (lost), 1 (lost again), 7 (reapplied)
  • lost – field dropped in this conversion
  • lost again – it was already lost by a previous exporter
  • reapplied – successfully restored from a side-car when converting back to a richer format

Use --fail-on-loss to abort if any lost/lost again entries remain.

Configuration Analysis

When converting between different workflow execution environments, wf2wf automatically detects potential issues:

## Configuration Analysis

### Potential Issues for Distributed Computing

* **Memory**: 2 tasks without explicit memory requirements
* **Containers**: 3 tasks without container/conda specifications
* **Error Handling**: 3 tasks without retry specifications
* **File Transfer**: 6 files with auto-detected transfer modes

**Recommendations:**
* Add explicit resource requirements for all tasks
* Specify container images or conda environments for environment isolation
* Configure retry policies for fault tolerance
* Review file transfer modes for distributed execution

Use --interactive to get guided prompts for addressing these issues automatically.


📦 Installation

# PyPI (recommended)
pip install wf2wf

# or conda-forge (once feedstock is merged)
conda install -c conda-forge wf2wf

Development install:

git clone https://github.com/csmcal/wf2wf.git && cd wf2wf
pip install -e .[dev]
pre-commit install
pytest -q

🚀 Quick CLI Tour

# Convert Snakemake → DAGMan and build digest-pinned images
wf2wf convert -i Snakefile -o pipeline.dag --auto-env build --push-registry ghcr.io/myorg --report-md --interactive

# Convert CWL → Nextflow, abort on any information loss
wf2wf convert -i analysis.cwl -o main.nf --out-format nextflow --fail-on-loss

# Validate a workflow and its loss side-car
wf2wf validate pipeline.dag

Interactive prompts (--interactive) use y/n/always/quit; loss prompts appear only for warn/error severities.


🛠 Commands Overview

Command Purpose
wf2wf convert Convert workflows between formats (all conversions go via the IR)
wf2wf validate Validate a workflow file or a .loss.json side-car
wf2wf info Pretty-print summary statistics of a workflow
wf2wf bco sign Sign a BioCompute Object and generate provenance attestation
wf2wf bco package Bundle a BCO and its artefacts (e.g. eSTAR ZIP)

Each command accepts --help for full usage details.

Auto-detection matrix

Extension Format
.cwl CWL
.dag DAGMan
.ga Galaxy
.json IR (JSON)
.nf Nextflow
.smk Snakemake
.wdl WDL
.yaml, .yml IR (YAML)

🔬 Examples

Example 1 – Snakemake → DAGMan with automatic environment build

wf2wf convert -i Snakefile \
              -o pipeline.dag \
              --out-format dagman \
              --auto-env build --push-registry ghcr.io/myorg

Example 2 – CWL → Nextflow round-trip

# CWL → IR → Nextflow
airflow_cwl="workflow.cwl"
wf2wf convert -i "$airflow_cwl" -o main.nf --out-format nextflow

# … do some edits …
# Nextflow → IR → CWL (should restore metadata)
wf2wf convert -i main.nf -o roundtrip.cwl --out-format cwl

Example 3 – WDL → CWL with loss checking

wf2wf convert -i assembly.wdl -o assembly.cwl --out-format cwl --fail-on-loss

More recipes live in the examples/ directory.


🔄 Workflow Conversion Differences

When converting between different workflow execution environments, several key differences need to be addressed:

Shared Filesystem vs Distributed Computing

Shared Filesystem Workflows (Snakemake, CWL, Nextflow):

  • Assume all files are accessible on a shared filesystem
  • Often have minimal resource specifications
  • Rely on system-wide software or conda environments
  • Basic error handling and retry mechanisms

Distributed Computing Workflows (HTCondor/DAGMan):

  • Require explicit file transfer specifications
  • Need explicit resource allocation (CPU, memory, disk)
  • Require container specifications for environment isolation
  • Benefit from sophisticated retry policies and error handling

Key Conversion Challenges

Challenge Shared → Distributed Distributed → Shared
File Transfer Add transfer_input_files/transfer_output_files Remove transfer specifications
Resources Add request_cpus, request_memory, request_disk Convert to engine-specific resource formats
Containers Specify Docker/Singularity images Map to conda environments or system packages
Error Handling Add retry policies and error strategies Convert to engine-specific error handling
Scatter/Gather Expand to explicit job definitions Map to wildcards or engine-specific parallelization

Interactive Configuration Assistance

Use --interactive mode to get guided assistance:

# Interactive conversion with configuration prompts
wf2wf convert -i Snakefile -o workflow.dag --interactive

# Example prompts you'll see:
# Found 3 tasks without explicit resource requirements. 
# Distributed systems require explicit resource allocation. 
# Add default resource specifications? (y)es/(n)o/(a)lways/(q)uit: y

Automatic Configuration Analysis

The conversion report includes detailed analysis:

## Configuration Analysis

### Potential Issues for Distributed Computing

* **Memory**: 2 tasks without explicit memory requirements
* **Containers**: 3 tasks without container/conda specifications  
* **Error Handling**: 3 tasks without retry specifications
* **File Transfer**: 6 files with auto-detected transfer modes

**Recommendations:**
* Add explicit resource requirements for all tasks
* Specify container images or conda environments for environment isolation
* Configure retry policies for fault tolerance
* Review file transfer modes for distributed execution

See File Transfer Handling for detailed information about file transfer modes and best practices.


🤝 Contributing

  1. Fork the repository and create a feature branch (git checkout -b feature/amazing-feature)
  2. Add tests for new functionality
  3. Ensure all tests pass (pytest -q)
  4. Open a Pull Request – GitHub Actions will run the test matrix automatically

Please read CONTRIBUTING.md for the full guidelines.

🧪 Testing

Run the comprehensive test suite:

# Run all tests
python -m pytest tests/ -v

# Run specific test categories
python -m pytest tests/test_conversions.py::TestConversions::test_linear_workflow_conversion -v

# Run with coverage
python -m pytest tests/ --cov=wf2wf --cov-report=html

📞 Support

  • 📖 Documentation – The docs/ folder contains a growing knowledge-base, rendered on ReadTheDocs.
  • 🐛 Issues – Found a bug or missing feature? Open an issue.
  • 💬 Discussions – General questions and ideas live in GitHub Discussions.

📄 License

wf2wf is licensed under the MIT License – see the LICENSE file for details.


🙏 Acknowledgments

  • CHTC - The Center for High Throughput Computing for testing and feedback
  • The OpenAI and Anthropic-powered coding assistants whose suggestions accelerated feature implementation
  • Cursor - Interactive IDE used for pair-programming and AI-assisted development

Bridging workflow ecosystems.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wf2wf-1.1.0.tar.gz (254.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wf2wf-1.1.0-py3-none-any.whl (241.4 kB view details)

Uploaded Python 3

File details

Details for the file wf2wf-1.1.0.tar.gz.

File metadata

  • Download URL: wf2wf-1.1.0.tar.gz
  • Upload date:
  • Size: 254.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for wf2wf-1.1.0.tar.gz
Algorithm Hash digest
SHA256 bc2a3293f3ebb667149642029449d9b3f5e755f0bc82b741c1a263d678ff7a90
MD5 64f763d754dfec2a4e8688d9d60541e7
BLAKE2b-256 ffda35d538f441d6fb90a51a292b3e7e06e199a18e80a81859585f1721b462ee

See more details on using hashes here.

File details

Details for the file wf2wf-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: wf2wf-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 241.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for wf2wf-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e38563fb476d7ea93c8a1afb143f49c1c5cee8d09c607228e1c26c37a6aad01e
MD5 e7b1a9540eef2d7768eadb7b2d50d754
BLAKE2b-256 a96f5ce1f6c6997abb1cae767ed12ad7c098aabfa91046787620c778632eb7a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page