Skip to main content

A disease-agnostic framework for identifying molecular subtypes through pathway-based analysis of rare genetic variants

Project description

Pathway Subtyping Framework

A Disease-Agnostic Tool for Pathway-Based Molecular Subtype Discovery

CI Python 3.9+ License: MIT Code style: black


Overview

The Pathway Subtyping Framework is an open-source computational tool for identifying molecular subtypes in genetically heterogeneous diseases. Instead of analyzing individual genes, it aggregates rare variant burden at the biological pathway level, enabling:

  • Better signal detection across genetically diverse cohorts
  • Identification of biologically coherent patient subgroups
  • Cross-cohort validation of discovered subtypes

Originally developed for autism research, this generalized version can be adapted for any disease with:

  • Genetic heterogeneity (many implicated genes)
  • Convergent pathway biology
  • Available exome/genome sequencing data

Supported Disease Areas

Disease Status Pathway File
Autism Spectrum Disorder Validated autism_pathways.gmt
Schizophrenia Template schizophrenia_pathways.gmt
Epilepsy Template epilepsy_pathways.gmt
Intellectual Disability Template intellectual_disability_pathways.gmt
Your disease Adapt it → your_pathways.gmt

Key Features

Feature Description
Pathway Scoring Aggregate gene burdens across biological pathways
Subtype Discovery GMM clustering with automatic model selection (BIC)
Validation Gates Negative controls + bootstrap stability testing
Reproducibility Deterministic execution, pinned dependencies, Docker
Config-Driven YAML configuration for all parameters

Quick Start

Installation

# Clone the repository
git clone https://github.com/topmist-admin/pathway-subtyping-framework
cd pathway-subtyping-framework

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install the package
pip install -e .

# Verify installation
psf --version

Run with Sample Data

# Run the pipeline with synthetic test data
psf --config configs/test_synthetic.yaml

# View results
cat outputs/synthetic_test/report.md

Run with Your Data

# Copy and customize a config
cp configs/example_autism.yaml configs/my_analysis.yaml

# Edit paths in my_analysis.yaml, then run
psf --config configs/my_analysis.yaml

Try in Browser (No Installation)

Open In Colab

Docker

# Run pipeline
docker-compose run pipeline

# Run tests
docker-compose run test

# Start Jupyter notebook
docker-compose up jupyter
# Open http://localhost:8888

Adapting for Your Disease

  1. Create a pathway GMT file with disease-relevant gene sets
  2. Copy an example config and point to your data
  3. Run the pipeline — validation gates will tell you if subtypes are meaningful

See the full guide: Adapting for Your Disease

How It Works

VCF Input → Variant Filter → Gene Burden → Pathway Aggregation → GMM Clustering → Validation → Report

1. Pathway Scoring

Rare damaging variants are aggregated into pathway-level disruption scores:

  • Loss-of-function variants weighted higher
  • Missense variants weighted by CADD score
  • Scores normalized across samples

2. Subtype Discovery

Gaussian Mixture Model clustering identifies patient subgroups:

  • Automatic cluster selection via BIC
  • Configurable cluster range (default: 2-8)

3. Validation Gates

Built-in tests prevent overfitting:

  • Label shuffle: Randomized labels should NOT cluster (ARI < 0.15)
  • Random genes: Fake pathways should NOT work (ARI < 0.15)
  • Bootstrap: Clusters should be stable under resampling (ARI > 0.8)

Data Requirements

Input Format Notes
Variants VCF Annotated with gene symbols, consequences
Phenotypes CSV Sample IDs + clinical features
Pathways GMT Gene sets for your disease

Your data stays on your infrastructure. The framework runs locally or in your cloud environment.

Project Structure

pathway-subtyping-framework/
├── src/pathway_subtyping/     # Core Python package
├── configs/                   # Example YAML configurations
│   ├── example_autism.yaml
│   ├── example_schizophrenia.yaml
│   ├── test_synthetic.yaml    # Ready-to-run test config
│   └── example_epilepsy.yaml
├── data/
│   ├── pathways/              # Pathway GMT files
│   └── sample/                # Synthetic test data
├── docs/guides/               # Documentation
│   ├── adapting-for-your-disease.md
│   ├── pathway-curation-guide.md
│   └── validation-gates.md
├── examples/notebooks/        # Jupyter tutorials
├── tests/                     # Test suite (64 tests)
├── Dockerfile                 # Container support
└── docker-compose.yml         # Easy orchestration

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run linting
black src/ tests/
isort src/ tests/
flake8 src/ tests/

# Set up pre-commit hooks
pre-commit install

Related Projects

Contributing

Contributions welcome! Areas where help is needed:

  • Additional disease pathway definitions
  • Performance optimization for large cohorts
  • Documentation and tutorials

See CONTRIBUTING.md for guidelines.

Citation

If you use this framework, please cite:

Chauhan R. Pathway Subtyping Framework. GitHub. 2026.
https://github.com/topmist-admin/pathway-subtyping-framework

For autism-specific work, also cite:

Chauhan R. Autism Pathway Framework. Zenodo. 2026.
DOI: 10.5281/zenodo.18403844

License

MIT License — see LICENSE for details.

Contact

Rohit Chauhan


RESEARCH USE ONLY — This framework is for hypothesis generation. Not for clinical diagnosis or treatment decisions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathway_subtyping-0.1.0.tar.gz (120.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pathway_subtyping-0.1.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file pathway_subtyping-0.1.0.tar.gz.

File metadata

  • Download URL: pathway_subtyping-0.1.0.tar.gz
  • Upload date:
  • Size: 120.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for pathway_subtyping-0.1.0.tar.gz
Algorithm Hash digest
SHA256 048c7d35ddcaf9e1fa019c9b34110bbbeea4dbd9533ae68677a92a19c332cac0
MD5 804c0150fa9134b81f18c95bd8197525
BLAKE2b-256 2d22f50c25bb33499c6460f0d8c3bc1cd153a3b15718527e57eaeaf541f60367

See more details on using hashes here.

File details

Details for the file pathway_subtyping-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pathway_subtyping-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3317eff741fe6d8b2bcee43ac29b225afe0c4f96b0c11484a8e207e74aa75ff6
MD5 51f579695232bfc8eba38b26075cee2f
BLAKE2b-256 375b84b0de8e45879677a1a32e1a890edf23c615e95e107753040d4c25a00f0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page