Skip to main content

AI-Powered Cloudification of Bioinformatics Workflows

Project description

Workflow Clinic

Workflow Clinic is a GSoC 2026 project focused on improving the portability, reproducibility, and cloud-readiness of scientific workflows.

The project aims to analyze workflow languages such as Nextflow and Snakemake, convert them into a common intermediate representation called WorkflowBundle, and identify workflow portability issues through automated validation and analysis.

By using a common workflow model inspired by the DAW (Data Analysis Workflow) metamodel, Workflow Clinic can reason about workflows independently of their original language and provide consistent diagnostics, recommendations, and future repair capabilities.

Why Workflow Clinic?

Scientific workflows are often tightly coupled to specific execution environments, storage systems, schedulers, or local infrastructure.

This can make workflows difficult to:

  • Share
  • Reproduce
  • Port across platforms
  • Execute in cloud environments
  • Integrate with GA4GH-compliant services

Workflow Clinic aims to help workflow authors identify and resolve these issues before deployment.

Planned Features

Workflow Parsing

  • Nextflow support
  • Snakemake support
  • Common WorkflowBundle representation

Workflow Analysis

  • Portability diagnostics
  • Storage validation
  • Resource validation
  • Metadata validation
  • Workflow structure validation

AI-Assisted Review

  • Rule-based workflow checks
  • AI-assisted diagnostics
  • Confidence-based recommendations

Workflow Repair

  • Suggested fixes
  • Automated transformations
  • Validation of generated fixes

Installation

Clone the Repository

git clone https://github.com/revaarathore11/ga4gh_workflow_clinic_gsoc_2026-.git
cd ga4gh_workflow_clinic_gsoc_2026-

Create a Virtual Environment

python -m venv .venv
source .venv/bin/activate

Install Dependencies

pip install -e ".[dev]"

Development

Run Tests

pytest

Run Linting

ruff check .

Run Formatting

ruff format .

Supported Workflow Languages

Current target languages:

  • Nextflow
  • Snakemake

Potential future support:

  • CWL
  • WDL

Architecture Overview

Workflow Files
    ↓
  Parser
    ↓
WorkflowBundle
    ↓
Rule Engine
    ↓
 AI Critic
    ↓
  Doctor

Standards Alignment

Workflow Clinic is being designed with future compatibility in mind for:

  • GA4GH TES
  • GA4GH WES
  • GA4GH TRS
  • Workflow Run RO-Crate

License

This project is licensed under the Apache License 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

workflow_clinic-0.1.0.tar.gz (16.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

workflow_clinic-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file workflow_clinic-0.1.0.tar.gz.

File metadata

  • Download URL: workflow_clinic-0.1.0.tar.gz
  • Upload date:
  • Size: 16.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for workflow_clinic-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9169d9df27585eb43dfbf8848074f3294ae1253cb8c937ba1213b8d2fd0364ee
MD5 56c66101ff0aedcac0cb986c3d12d8a6
BLAKE2b-256 fabb4bb81ec5ca4c57d39c2564f25f8ee46880ea3a286e7b10bfffba63dff535

See more details on using hashes here.

Provenance

The following attestation bundles were made for workflow_clinic-0.1.0.tar.gz:

Publisher: release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file workflow_clinic-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: workflow_clinic-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for workflow_clinic-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d4f9e99b557c56f8c08eac8c5950686d221888962a82fdf72fe8163cddf499a1
MD5 80f80c353ff48d3821a25d43a1f705ba
BLAKE2b-256 d39964eb895eba3d778a1767544035d13b557c3af8682d22a6a45885d936a832

See more details on using hashes here.

Provenance

The following attestation bundles were made for workflow_clinic-0.1.0-py3-none-any.whl:

Publisher: release.yml on ga4gh/ga4gh_workflow_clinic_gsoc_2026

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page