Universal workflow-format converter built around a loss-preserving intermediate representation
Project description
wf2wf – Universal Workflow-Format Converter
wf2wf is a format-agnostic converter: any supported engine → Intermediate Representation (IR) → any other engine. The core library handles:
• Snakemake • Nextflow • CWL • HTCondor/DAGMan • WDL • Galaxy • bundled BioCompute Objects
graph TD;
A[Snakemake] -->|import| IR((IR));
B[DAGMan] --> IR;
C[CWL] --> IR;
D[Nextflow] --> IR;
E[WDL] --> IR;
F[Galaxy] --> IR;
IR -->|export| GA2[Galaxy];
IR --> SMK2[Snakemake];
IR --> CWL2[CWL];
IR --> DAG2[DAGMan];
IR --> NF2[Nextflow];
IR --> WDL2[WDL];
wf2wf is called from the command line as:
# Convert Snakemake → DAGMan and auto-generate Markdown report
wf2wf convert -i pipeline.smk -o pipeline.dag --auto-env build --interactive --report-md
📋 Table of Contents
- Features
- Installation
- Quick CLI Tour
- Commands
- Examples
- Contributing
- Support
- License
- Acknowledgements
✨ Features
- 🔄 Universal Conversion – Any supported engine → IR → any other engine with a single command.
- 🧬 Loss-Mapping & Round-Trip Fidelity – Structured loss reports (
*.loss.json) and automatic reinjection guarantee nothing disappears silently. - 🐳 Automated Environment Builds – Optional Conda-to-OCI pipeline (micromamba → conda-pack → buildx/buildah) with digest-pinned image references, SBOMs and Apptainer conversion.
- ⚖ Regulatory & Provenance Support – BioCompute Object generation, schema validation and side-car provenance for FDA submissions.
- 🧪 Aiming for Quality – high test coverage, semantic versioning, graceful degradation when optional external tools are missing.
- 🔧 Smart Configuration Analysis – Automatic detection and warnings for missing resource requirements, containers, error handling, and file transfer modes when converting between shared filesystem and distributed computing workflows.
- 💬 Interactive Mode – Guided prompts to help users address configuration gaps and optimize workflows for target execution environments.
Information-loss workflow
wf2wf should record every field the target engine cannot express:
⚠ Conversion losses: 2 (lost), 1 (lost again), 7 (reapplied)
lost– field dropped in this conversionlost again– it was already lost by a previous exporterreapplied– successfully restored from a side-car when converting back to a richer format
Use --fail-on-loss to abort if any lost/lost again entries remain.
Configuration Analysis
When converting between different workflow execution environments, wf2wf automatically detects potential issues:
## Configuration Analysis
### Potential Issues for Distributed Computing
* **Memory**: 2 tasks without explicit memory requirements
* **Containers**: 3 tasks without container/conda specifications
* **Error Handling**: 3 tasks without retry specifications
* **File Transfer**: 6 files with auto-detected transfer modes
**Recommendations:**
* Add explicit resource requirements for all tasks
* Specify container images or conda environments for environment isolation
* Configure retry policies for fault tolerance
* Review file transfer modes for distributed execution
Use --interactive to get guided prompts for addressing these issues automatically.
📦 Installation
# PyPI (recommended)
pip install wf2wf
# or conda-forge (once feedstock is merged)
conda install -c conda-forge wf2wf
Development install:
git clone https://github.com/csmcal/wf2wf.git && cd wf2wf
pip install -e .[dev]
pre-commit install
pytest -q
🚀 Quick CLI Tour
# Convert Snakemake → DAGMan and build digest-pinned images
wf2wf convert -i Snakefile -o pipeline.dag --auto-env build --push-registry ghcr.io/myorg --report-md --interactive
# Convert CWL → Nextflow, abort on any information loss
wf2wf convert -i analysis.cwl -o main.nf --out-format nextflow --fail-on-loss
# Validate a workflow and its loss side-car
wf2wf validate pipeline.dag
Interactive prompts (--interactive) use y/n/always/quit; loss prompts appear only for warn/error severities.
🛠 Commands Overview
| Command | Purpose |
|---|---|
wf2wf convert |
Convert workflows between formats (all conversions go via the IR) |
wf2wf validate |
Validate a workflow file or a .loss.json side-car |
wf2wf info |
Pretty-print summary statistics of a workflow |
wf2wf bco sign |
Sign a BioCompute Object and generate provenance attestation |
wf2wf bco package |
Bundle a BCO and its artefacts (e.g. eSTAR ZIP) |
Each command accepts --help for full usage details.
Auto-detection matrix
| Extension | Format |
|---|---|
.cwl |
CWL |
.dag |
DAGMan |
.ga |
Galaxy |
.json |
IR (JSON) |
.nf |
Nextflow |
.smk |
Snakemake |
.wdl |
WDL |
.yaml, .yml |
IR (YAML) |
🔬 Examples
Example 1 – Snakemake → DAGMan with automatic environment build
wf2wf convert -i Snakefile \
-o pipeline.dag \
--out-format dagman \
--auto-env build --push-registry ghcr.io/myorg
Example 2 – CWL → Nextflow round-trip
# CWL → IR → Nextflow
airflow_cwl="workflow.cwl"
wf2wf convert -i "$airflow_cwl" -o main.nf --out-format nextflow
# … do some edits …
# Nextflow → IR → CWL (should restore metadata)
wf2wf convert -i main.nf -o roundtrip.cwl --out-format cwl
Example 3 – WDL → CWL with loss checking
wf2wf convert -i assembly.wdl -o assembly.cwl --out-format cwl --fail-on-loss
More recipes live in the examples/ directory.
🔄 Workflow Conversion Differences
When converting between different workflow execution environments, several key differences need to be addressed:
Shared Filesystem vs Distributed Computing
Shared Filesystem Workflows (Snakemake, CWL, Nextflow):
- Assume all files are accessible on a shared filesystem
- Often have minimal resource specifications
- Rely on system-wide software or conda environments
- Basic error handling and retry mechanisms
Distributed Computing Workflows (HTCondor/DAGMan):
- Require explicit file transfer specifications
- Need explicit resource allocation (CPU, memory, disk)
- Require container specifications for environment isolation
- Benefit from sophisticated retry policies and error handling
Key Conversion Challenges
| Challenge | Shared → Distributed | Distributed → Shared |
|---|---|---|
| File Transfer | Add transfer_input_files/transfer_output_files |
Remove transfer specifications |
| Resources | Add request_cpus, request_memory, request_disk |
Convert to engine-specific resource formats |
| Containers | Specify Docker/Singularity images | Map to conda environments or system packages |
| Error Handling | Add retry policies and error strategies | Convert to engine-specific error handling |
| Scatter/Gather | Expand to explicit job definitions | Map to wildcards or engine-specific parallelization |
Interactive Configuration Assistance
Use --interactive mode to get guided assistance:
# Interactive conversion with configuration prompts
wf2wf convert -i Snakefile -o workflow.dag --interactive
# Example prompts you'll see:
# Found 3 tasks without explicit resource requirements.
# Distributed systems require explicit resource allocation.
# Add default resource specifications? (y)es/(n)o/(a)lways/(q)uit: y
Automatic Configuration Analysis
The conversion report includes detailed analysis:
## Configuration Analysis
### Potential Issues for Distributed Computing
* **Memory**: 2 tasks without explicit memory requirements
* **Containers**: 3 tasks without container/conda specifications
* **Error Handling**: 3 tasks without retry specifications
* **File Transfer**: 6 files with auto-detected transfer modes
**Recommendations:**
* Add explicit resource requirements for all tasks
* Specify container images or conda environments for environment isolation
* Configure retry policies for fault tolerance
* Review file transfer modes for distributed execution
See File Transfer Handling for detailed information about file transfer modes and best practices.
🤝 Contributing
- Fork the repository and create a feature branch (
git checkout -b feature/amazing-feature) - Add tests for new functionality
- Ensure all tests pass (
pytest -q) - Open a Pull Request – GitHub Actions will run the test matrix automatically
Please read CONTRIBUTING.md for the full guidelines.
🧪 Testing
Run the comprehensive test suite:
# Run all tests
python -m pytest tests/ -v
# Run specific test categories
python -m pytest tests/test_conversions.py::TestConversions::test_linear_workflow_conversion -v
# Run with coverage
python -m pytest tests/ --cov=wf2wf --cov-report=html
📞 Support
- 📖 Documentation – The
docs/folder contains a growing knowledge-base, rendered on ReadTheDocs. - 🐛 Issues – Found a bug or missing feature? Open an issue.
- 💬 Discussions – General questions and ideas live in GitHub Discussions.
📄 License
wf2wf is licensed under the MIT License – see the LICENSE file for details.
🙏 Acknowledgments
- CHTC - The Center for High Throughput Computing for testing and feedback
- The OpenAI and Anthropic-powered coding assistants whose suggestions accelerated feature implementation
- Cursor - Interactive IDE used for pair-programming and AI-assisted development
Bridging workflow ecosystems.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wf2wf-1.1.0.tar.gz.
File metadata
- Download URL: wf2wf-1.1.0.tar.gz
- Upload date:
- Size: 254.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc2a3293f3ebb667149642029449d9b3f5e755f0bc82b741c1a263d678ff7a90
|
|
| MD5 |
64f763d754dfec2a4e8688d9d60541e7
|
|
| BLAKE2b-256 |
ffda35d538f441d6fb90a51a292b3e7e06e199a18e80a81859585f1721b462ee
|
File details
Details for the file wf2wf-1.1.0-py3-none-any.whl.
File metadata
- Download URL: wf2wf-1.1.0-py3-none-any.whl
- Upload date:
- Size: 241.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e38563fb476d7ea93c8a1afb143f49c1c5cee8d09c607228e1c26c37a6aad01e
|
|
| MD5 |
e7b1a9540eef2d7768eadb7b2d50d754
|
|
| BLAKE2b-256 |
a96f5ce1f6c6997abb1cae767ed12ad7c098aabfa91046787620c778632eb7a3
|