AI-Powered Cloudification of Bioinformatics Workflows
Project description
Workflow Clinic
Workflow Clinic is a GSoC 2026 project focused on improving the portability, reproducibility, and cloud-readiness of scientific workflows.
The project aims to analyze workflow languages such as Nextflow and Snakemake, convert them into a common intermediate representation called WorkflowBundle, and identify workflow portability issues through automated validation and analysis.
By using a common workflow model inspired by the DAW (Data Analysis Workflow) metamodel, Workflow Clinic can reason about workflows independently of their original language and provide consistent diagnostics, recommendations, and future repair capabilities.
Why Workflow Clinic?
Scientific workflows are often tightly coupled to specific execution environments, storage systems, schedulers, or local infrastructure.
This can make workflows difficult to:
- Share
- Reproduce
- Port across platforms
- Execute in cloud environments
- Integrate with GA4GH-compliant services
Workflow Clinic aims to help workflow authors identify and resolve these issues before deployment.
Planned Features
Workflow Parsing
- Nextflow support
- Snakemake support
- Common WorkflowBundle representation
Workflow Analysis
- Portability diagnostics
- Storage validation
- Resource validation
- Metadata validation
- Workflow structure validation
AI-Assisted Review
- Rule-based workflow checks
- AI-assisted diagnostics
- Confidence-based recommendations
Workflow Repair
- Suggested fixes
- Automated transformations
- Validation of generated fixes
Installation
Clone the Repository
git clone https://github.com/revaarathore11/ga4gh_workflow_clinic_gsoc_2026-.git
cd ga4gh_workflow_clinic_gsoc_2026-
Create a Virtual Environment
python -m venv .venv
source .venv/bin/activate
Install Dependencies
pip install -e ".[dev]"
Development
Run Tests
pytest
Run Linting
ruff check .
Run Formatting
ruff format .
Supported Workflow Languages
Current target languages:
- Nextflow
- Snakemake
Potential future support:
- CWL
- WDL
Architecture Overview
Workflow Files
↓
Parser
↓
WorkflowBundle
↓
Rule Engine
↓
AI Critic
↓
Doctor
Standards Alignment
Workflow Clinic is being designed with future compatibility in mind for:
- GA4GH TES
- GA4GH WES
- GA4GH TRS
- Workflow Run RO-Crate
License
This project is licensed under the Apache License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file workflow_clinic-0.0.1.tar.gz.
File metadata
- Download URL: workflow_clinic-0.0.1.tar.gz
- Upload date:
- Size: 16.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b253559c1b0e4ba8fd51e87e20d0428027a2cea96552cf9535bb3224bb3fcedf
|
|
| MD5 |
c877351c3c167ce30f7d76503a3cf2ad
|
|
| BLAKE2b-256 |
04b15b755a9307e5c17379e923029cf921c43ea28f665db8d32c0223d596a95d
|
Provenance
The following attestation bundles were made for workflow_clinic-0.0.1.tar.gz:
Publisher:
release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
workflow_clinic-0.0.1.tar.gz -
Subject digest:
b253559c1b0e4ba8fd51e87e20d0428027a2cea96552cf9535bb3224bb3fcedf - Sigstore transparency entry: 1869301160
- Sigstore integration time:
-
Permalink:
ga4gh/ga4gh_workflow_clinic_gsoc_2026@ca610e9ef8c3cb013bfbd7fc34f9e947f71a848a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ga4gh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ca610e9ef8c3cb013bfbd7fc34f9e947f71a848a -
Trigger Event:
push
-
Statement type:
File details
Details for the file workflow_clinic-0.0.1-py3-none-any.whl.
File metadata
- Download URL: workflow_clinic-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
311aa79cca25e826d73d5d1893b882223b809e418107bac6e90d67cb00088ee9
|
|
| MD5 |
3064cc598576f714b206946192abc381
|
|
| BLAKE2b-256 |
99dc8bd09774611cbf90df5113c244eb09537c9a33c64f5d85190020335be371
|
Provenance
The following attestation bundles were made for workflow_clinic-0.0.1-py3-none-any.whl:
Publisher:
release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
workflow_clinic-0.0.1-py3-none-any.whl -
Subject digest:
311aa79cca25e826d73d5d1893b882223b809e418107bac6e90d67cb00088ee9 - Sigstore transparency entry: 1869301256
- Sigstore integration time:
-
Permalink:
ga4gh/ga4gh_workflow_clinic_gsoc_2026@ca610e9ef8c3cb013bfbd7fc34f9e947f71a848a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ga4gh
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@ca610e9ef8c3cb013bfbd7fc34f9e947f71a848a -
Trigger Event:
push
-
Statement type: