Skip to main content

AI-Powered Cloudification of Bioinformatics Workflows

Project description

Workflow Clinic

Workflow Clinic is a GSoC 2026 project focused on improving the portability, reproducibility, and cloud-readiness of scientific workflows.

The project aims to analyze workflow languages such as Nextflow and Snakemake, convert them into a common intermediate representation called WorkflowBundle, and identify workflow portability issues through automated validation and analysis.

By using a common workflow model inspired by the DAW (Data Analysis Workflow) metamodel, Workflow Clinic can reason about workflows independently of their original language and provide consistent diagnostics, recommendations, and future repair capabilities.

Why Workflow Clinic?

Scientific workflows are often tightly coupled to specific execution environments, storage systems, schedulers, or local infrastructure.

This can make workflows difficult to:

  • Share
  • Reproduce
  • Port across platforms
  • Execute in cloud environments
  • Integrate with GA4GH-compliant services

Workflow Clinic aims to help workflow authors identify and resolve these issues before deployment.

Planned Features

Workflow Parsing

  • Nextflow support
  • Snakemake support
  • Common WorkflowBundle representation

Workflow Analysis

  • Portability diagnostics
  • Storage validation
  • Resource validation
  • Metadata validation
  • Workflow structure validation

AI-Assisted Review

  • Rule-based workflow checks
  • AI-assisted diagnostics
  • Confidence-based recommendations

Workflow Repair

  • Suggested fixes
  • Automated transformations
  • Validation of generated fixes

Installation

Clone the Repository

git clone https://github.com/revaarathore11/ga4gh_workflow_clinic_gsoc_2026-.git
cd ga4gh_workflow_clinic_gsoc_2026-

Create a Virtual Environment

python -m venv .venv
source .venv/bin/activate

Install Dependencies

pip install -e ".[dev]"

Development

Run Tests

pytest

Run Linting

ruff check .

Run Formatting

ruff format .

Supported Workflow Languages

Current target languages:

  • Nextflow
  • Snakemake

Potential future support:

  • CWL
  • WDL

Architecture Overview

Workflow Files
    ↓
  Parser
    ↓
WorkflowBundle
    ↓
Rule Engine
    ↓
 AI Critic
    ↓
  Doctor

Standards Alignment

Workflow Clinic is being designed with future compatibility in mind for:

  • GA4GH TES
  • GA4GH WES
  • GA4GH TRS
  • Workflow Run RO-Crate

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

workflow_clinic-0.0.1.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

workflow_clinic-0.0.1-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file workflow_clinic-0.0.1.tar.gz.

File metadata

  • Download URL: workflow_clinic-0.0.1.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for workflow_clinic-0.0.1.tar.gz
Algorithm Hash digest
SHA256 b253559c1b0e4ba8fd51e87e20d0428027a2cea96552cf9535bb3224bb3fcedf
MD5 c877351c3c167ce30f7d76503a3cf2ad
BLAKE2b-256 04b15b755a9307e5c17379e923029cf921c43ea28f665db8d32c0223d596a95d

See more details on using hashes here.

Provenance

The following attestation bundles were made for workflow_clinic-0.0.1.tar.gz:

Publisher: release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file workflow_clinic-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: workflow_clinic-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for workflow_clinic-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 311aa79cca25e826d73d5d1893b882223b809e418107bac6e90d67cb00088ee9
MD5 3064cc598576f714b206946192abc381
BLAKE2b-256 99dc8bd09774611cbf90df5113c244eb09537c9a33c64f5d85190020335be371

See more details on using hashes here.

Provenance

The following attestation bundles were made for workflow_clinic-0.0.1-py3-none-any.whl:

Publisher: release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page