Skip to main content

Universal workflow-format converter built around a loss-preserving intermediate representation

Project description

wf2wf – Universal Workflow-Format Converter

Python CI Docs PyPI codecov License Workflow Schema Loss Schema

wf2wf is a format-agnostic converter: any supported engine → Intermediate Representation (IR) → any other engine. The core library handles: 

Snakemake • Nextflow • CWL • HTCondor/DAGMan • WDL • Galaxy • bundled BioCompute Objects

graph TD;
  A[Snakemake] -->|import| IR((IR));
  B[DAGMan]   --> IR;
  C[CWL]      --> IR;
  D[Nextflow] --> IR;
  E[WDL]      --> IR;
  F[Galaxy]   --> IR;
  IR -->|export| GA2[Galaxy];
  IR --> SMK2[Snakemake];
  IR --> CWL2[CWL];
  IR --> DAG2[DAGMan];
  IR --> NF2[Nextflow];
  IR --> WDL2[WDL];

wf2wf is called from the command line as:

# Convert Snakemake → DAGMan and auto-generate Markdown report
wf2wf convert -i pipeline.smk -o pipeline.dag --auto-env build --interactive --report-md

📋 Table of Contents


✨ Features

  • 🔄 Universal Conversion – Any supported engine → IR → any other engine with a single command.
  • 🧬 Loss-Mapping & Round-Trip Fidelity – Structured loss reports (*.loss.json) and automatic reinjection guarantee nothing disappears silently.
  • 🐳 Automated Environment Builds – Optional Conda-to-OCI pipeline (micromamba → conda-pack → buildx/buildah) with digest-pinned image references, SBOMs and Apptainer conversion.
  • ⚖ Regulatory & Provenance Support – BioCompute Object generation, schema validation and side-car provenance for FDA submissions.
  • 🧪 Aiming for Quality – high test coverage, semantic versioning, graceful degradation when optional external tools are missing.

Information-loss workflow

wf2wf should record every field the target engine cannot express:

⚠ Conversion losses: 2 (lost), 1 (lost again), 7 (reapplied)
  • lost – field dropped in this conversion
  • lost again – it was already lost by a previous exporter
  • reapplied – successfully restored from a side-car when converting back to a richer format

Use --fail-on-loss to abort if any lost/lost again entries remain.


📦 Installation

# PyPI (recommended)
pip install wf2wf

# or conda-forge (once feedstock is merged)
conda install -c conda-forge wf2wf

Development install:

git clone https://github.com/your-org/wf2wf.git && cd wf2wf
pip install -e .[dev]
pre-commit install
pytest -q

🚀 Quick CLI Tour

# Convert Snakemake → DAGMan and build digest-pinned images
wf2wf convert -i Snakefile -o pipeline.dag --auto-env build --push-registry ghcr.io/myorg --report-md --interactive

# Convert CWL → Nextflow, abort on any information loss
wf2wf convert -i analysis.cwl -o main.nf --out-format nextflow --fail-on-loss

# Validate a workflow and its loss side-car
wf2wf validate pipeline.dag

Interactive prompts (--interactive) use y/n/always/quit; loss prompts appear only for warn/error severities.


🛠 Commands Overview

Command Purpose
wf2wf convert Convert workflows between formats (all conversions go via the IR)
wf2wf validate Validate a workflow file or a .loss.json side-car
wf2wf info Pretty-print summary statistics of a workflow
wf2wf bco sign Sign a BioCompute Object and generate provenance attestation
wf2wf bco package Bundle a BCO and its artefacts (e.g. eSTAR ZIP)

Each command accepts --help for full usage details.

Auto-detection matrix

Extension Format
.cwl CWL
.dag DAGMan
.ga Galaxy
.json IR (JSON)
.nf Nextflow
.smk Snakemake
.wdl WDL
.yaml, .yml IR (YAML)

🔬 Examples

Example 1 – Snakemake → DAGMan with automatic environment build

wf2wf convert -i Snakefile \
              -o pipeline.dag \
              --out-format dagman \
              --auto-env build --push-registry ghcr.io/myorg

Example 2 – CWL → Nextflow round-trip

# CWL → IR → Nextflow
airflow_cwl="workflow.cwl"
wf2wf convert -i "$airflow_cwl" -o main.nf --out-format nextflow

# … do some edits …
# Nextflow → IR → CWL (should restore metadata)
wf2wf convert -i main.nf -o roundtrip.cwl --out-format cwl

Example 3 – WDL → CWL with loss checking

wf2wf convert -i assembly.wdl -o assembly.cwl --out-format cwl --fail-on-loss

More recipes live in the examples/ directory.


🤝 Contributing

  1. Fork the repository and create a feature branch (git checkout -b feature/amazing-feature)
  2. Add tests for new functionality
  3. Ensure all tests pass (pytest -q)
  4. Open a Pull Request – GitHub Actions will run the test matrix automatically

Please read CONTRIBUTING.md for the full guidelines.

🧪 Testing

Run the comprehensive test suite:

# Run all tests
python -m pytest tests/ -v

# Run specific test categories
python -m pytest tests/test_conversions.py::TestConversions::test_linear_workflow_conversion -v

# Run with coverage
python -m pytest tests/ --cov=wf2wf --cov-report=html

📞 Support

  • 📖 Documentation – The docs/ folder contains a growing knowledge-base, rendered on ReadTheDocs.
  • 🐛 Issues – Found a bug or missing feature? Open an issue.
  • 💬 Discussions – General questions and ideas live in GitHub Discussions.

📄 License

wf2wf is licensed under the MIT License – see the LICENSE file for details.


🙏 Acknowledgments

  • CHTC - The Center for High Throughput Computing for testing and feedback
  • The OpenAI and Anthropic-powered coding assistants whose suggestions accelerated feature implementation
  • Cursor - Interactive IDE used for pair-programming and AI-assisted development

Bridging workflow ecosystems.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wf2wf-1.0.0.tar.gz (137.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wf2wf-1.0.0-py3-none-any.whl (131.9 kB view details)

Uploaded Python 3

File details

Details for the file wf2wf-1.0.0.tar.gz.

File metadata

  • Download URL: wf2wf-1.0.0.tar.gz
  • Upload date:
  • Size: 137.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for wf2wf-1.0.0.tar.gz
Algorithm Hash digest
SHA256 750acf33cc72407c22a88a774d2b504ad822d142a6348c6357e3fc03dcd29617
MD5 60b707e9baf1f6ea8cea48ab2dce6d5b
BLAKE2b-256 9d83f2dc638c3c9f2a1d41d86673b688e6c0e7551b80bc85307db55d73ce79cc

See more details on using hashes here.

File details

Details for the file wf2wf-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: wf2wf-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 131.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for wf2wf-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 36097985f88158c272cf099ee169f1e143cb67bd8d00e1cb8f877780c0319c02
MD5 656d2acf2d48f27271cf46851e28d9d8
BLAKE2b-256 a9ef98b78bf93c2b83bd7f16d7cd3dc1495aca8b63381af54cddd2e78b8dc3db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page