Skip to main content

AI-Powered Cloudification of Bioinformatics Workflows

Project description

Workflow Clinic

Workflow Clinic is a GSoC 2026 project focused on improving the portability, reproducibility, and cloud-readiness of scientific workflows.

The project aims to analyze workflow languages such as Nextflow and Snakemake, convert them into a common intermediate representation called WorkflowBundle, and identify workflow portability issues through automated validation and analysis.

By using a common workflow model inspired by the DAW (Data Analysis Workflow) metamodel, Workflow Clinic can reason about workflows independently of their original language and provide consistent diagnostics, recommendations, and future repair capabilities.

Why Workflow Clinic?

Scientific workflows are often tightly coupled to specific execution environments, storage systems, schedulers, or local infrastructure.

This can make workflows difficult to:

  • Share
  • Reproduce
  • Port across platforms
  • Execute in cloud environments
  • Integrate with GA4GH-compliant services

Workflow Clinic aims to help workflow authors identify and resolve these issues before deployment.

Planned Features

Workflow Parsing

  • Nextflow support
  • Snakemake support
  • Common WorkflowBundle representation

Workflow Analysis

  • Portability diagnostics
  • Storage validation
  • Resource validation
  • Metadata validation
  • Workflow structure validation

AI-Assisted Review

  • Rule-based workflow checks
  • AI-assisted diagnostics
  • Confidence-based recommendations

Workflow Repair

  • Suggested fixes
  • Automated transformations
  • Validation of generated fixes

Installation

Clone the Repository

git clone https://github.com/revaarathore11/ga4gh_workflow_clinic_gsoc_2026-.git
cd ga4gh_workflow_clinic_gsoc_2026-

Create a Virtual Environment

python -m venv .venv
source .venv/bin/activate

Install Dependencies

pip install -e ".[dev]"

Development

Run Tests

pytest

Run Linting

ruff check .

Run Formatting

ruff format .

Supported Workflow Languages

Current target languages:

  • Nextflow
  • Snakemake

Potential future support:

  • CWL
  • WDL

Architecture Overview

Workflow Files
    ↓
  Parser
    ↓
WorkflowBundle
    ↓
Rule Engine
    ↓
 AI Critic
    ↓
  Doctor

Standards Alignment

Workflow Clinic is being designed with future compatibility in mind for:

  • GA4GH TES
  • GA4GH WES
  • GA4GH TRS
  • Workflow Run RO-Crate

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

workflow_clinic-0.2.0.tar.gz (18.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

workflow_clinic-0.2.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file workflow_clinic-0.2.0.tar.gz.

File metadata

  • Download URL: workflow_clinic-0.2.0.tar.gz
  • Upload date:
  • Size: 18.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for workflow_clinic-0.2.0.tar.gz
Algorithm Hash digest
SHA256 17b757fd6f3c806582230363e5bfdffcfd909f7a4544ddfa846b9308d24dd91b
MD5 d1d086610ce951a4f3b641f9afd11a11
BLAKE2b-256 706ee5386a6e0b3728aa0fba1e4f4d853128e09d61f02bb9a07f64ff3c16ea32

See more details on using hashes here.

Provenance

The following attestation bundles were made for workflow_clinic-0.2.0.tar.gz:

Publisher: release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file workflow_clinic-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: workflow_clinic-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for workflow_clinic-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d1146971262a617cca0d18f3985ebcdcc6f527b4a4b6da6f49ec054a0f392dbd
MD5 969fa9a916540bb1c3d0ef7e00cdd4a6
BLAKE2b-256 a0bd5287f722920aa65173ee170bd591dd1398f20b9f1f633300f8ee4a4ea774

See more details on using hashes here.

Provenance

The following attestation bundles were made for workflow_clinic-0.2.0-py3-none-any.whl:

Publisher: release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page