A disease-agnostic framework for identifying molecular subtypes through pathway-based analysis of rare genetic variants

These details have not been verified by PyPI

Project links

Project description

Pathway Subtyping Framework

A Disease-Agnostic Tool for Pathway-Based Molecular Subtype Discovery

Overview

The Pathway Subtyping Framework is an open-source computational tool for identifying molecular subtypes in genetically heterogeneous diseases. Instead of analyzing individual genes, it aggregates rare variant burden at the biological pathway level, enabling:

Better signal detection across genetically diverse cohorts
Identification of biologically coherent patient subgroups
Cross-cohort validation of discovered subtypes

Originally developed for autism research, this generalized version can be adapted for any disease with:

Genetic heterogeneity (many implicated genes)
Convergent pathway biology
Available exome/genome sequencing data

Supported Disease Areas

Disease	Status	Pathway File
Autism Spectrum Disorder	Validated	`autism_pathways.gmt`
Schizophrenia	Template	`schizophrenia_pathways.gmt`
Epilepsy	Template	`epilepsy_pathways.gmt`
Intellectual Disability	Template	`intellectual_disability_pathways.gmt`
Parkinson's Disease	Template	`parkinsons_pathways.gmt`
Bipolar Disorder	Template	`bipolar_pathways.gmt`
Your disease	Adapt it →	`your_pathways.gmt`

Key Features

Feature	Description
Pathway Scoring	Aggregate gene burdens across biological pathways
Multiple Clustering	GMM, K-means, Hierarchical, Spectral with cross-validation
Ancestry Correction	PCA-based population stratification correction with independence testing
Batch Correction	ComBat-style batch effect detection and correction
Sensitivity Analysis	Parameter robustness testing across algorithms, features, normalization
Validation Gates	Negative controls + bootstrap stability + ancestry independence testing
Statistical Rigor	FDR correction, effect sizes, confidence intervals
Power Analysis	Sample size recommendations, Type I error estimation
Simulation	Synthetic data generation with ground truth for validation
Reproducibility	Deterministic execution, pinned dependencies, Docker
Config-Driven	YAML configuration for all parameters

Quick Start

Installation

# Clone the repository
git clone https://github.com/topmist-admin/pathway-subtyping-framework
cd pathway-subtyping-framework

# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install the package
pip install -e .

# Verify installation
psf --version

Run with Sample Data

# Run the pipeline with synthetic test data
psf --config configs/test_synthetic.yaml

# View results
cat outputs/synthetic_test/report.md

Run with Your Data

# Copy and customize a config
cp configs/example_autism.yaml configs/my_analysis.yaml

# Edit paths in my_analysis.yaml, then run
psf --config configs/my_analysis.yaml

Try in Browser (No Installation)

Docker

# Run pipeline
docker-compose run pipeline

# Run tests
docker-compose run test

# Start Jupyter notebook
docker-compose up jupyter
# Open http://localhost:8888

Adapting for Your Disease

Create a pathway GMT file with disease-relevant gene sets
Copy an example config and point to your data
Run the pipeline — validation gates will tell you if subtypes are meaningful

See the full guide: Adapting for Your Disease

How It Works

VCF Input → Variant Filter → Gene Burden → Pathway Aggregation → [Ancestry Correction] → [Batch Correction] → GMM Clustering → [Sensitivity Analysis] → Validation → Report

1. Pathway Scoring

Rare damaging variants are aggregated into pathway-level disruption scores:

Loss-of-function variants weighted higher
Missense variants weighted by CADD score
Scores normalized across samples

2. Subtype Discovery

Multiple clustering algorithms identify patient subgroups:

GMM (default): Soft assignments, automatic selection via BIC
K-means: Fast, spherical clusters
Hierarchical: Dendogram-based, no K required
Spectral: Nonlinear boundaries
Cross-validation for stability assessment
Algorithm comparison with pairwise ARI

3. Validation Gates

Built-in tests prevent overfitting:

Label shuffle: Randomized labels should NOT cluster (ARI < 0.15)
Random genes: Fake pathways should NOT work (ARI < 0.15)
Bootstrap: Clusters should be stable under resampling (ARI > 0.8)
Ancestry independence: Clusters should not correlate with ancestry PCs (when provided)

4. Statistical Rigor

Publication-quality statistics:

FDR correction: Benjamini-Hochberg for multiple testing
Effect sizes: Cohen's d with 95% bootstrap confidence intervals
Power analysis: Sample size recommendations for target effect sizes
Type I error: Estimation via null simulations

See docs/METHODS.md for full statistical methodology.

Data Requirements

Input	Format	Notes
Variants	VCF	Annotated with gene symbols, consequences
Phenotypes	CSV	Sample IDs + clinical features
Pathways	GMT	Gene sets for your disease

Your data stays on your infrastructure. The framework runs locally or in your cloud environment.

Project Structure

pathway-subtyping-framework/
├── src/pathway_subtyping/     # Core Python package
│   ├── pipeline.py            # Main pipeline
│   ├── clustering.py          # Multiple clustering algorithms
│   ├── statistical_rigor.py   # FDR, effect sizes, burden weights
│   ├── simulation.py          # Synthetic data & power analysis
│   ├── validation.py          # Validation gates
│   ├── ancestry.py            # Population stratification correction
│   ├── batch_correction.py    # Batch effect detection & correction
│   ├── sensitivity.py         # Parameter sensitivity analysis
│   └── data_quality.py        # VCF quality checks
├── configs/                   # Example YAML configurations
├── data/
│   ├── pathways/              # Pathway GMT files (6 diseases)
│   └── sample/                # Synthetic test data
├── docs/
│   ├── METHODS.md             # Statistical methods documentation
│   └── guides/                # User guides
├── examples/notebooks/        # Jupyter tutorials
├── tests/                     # Test suite (347 tests)
├── Dockerfile                 # Container support
└── docker-compose.yml         # Easy orchestration

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run linting
black src/ tests/
isort src/ tests/
flake8 src/ tests/

# Set up pre-commit hooks
pre-commit install

Related Projects

Autism Pathway Framework — The original autism-focused implementation with SFARI cohort validation

Contributing

Contributions welcome! Areas where help is needed:

Additional disease pathway definitions
Performance optimization for large cohorts
Documentation and tutorials

See CONTRIBUTING.md for guidelines.

Citation

If you use this framework, please cite:

Chauhan R. Pathway Subtyping Framework. GitHub. 2026.
https://github.com/topmist-admin/pathway-subtyping-framework

For autism-specific work, also cite:

Chauhan R. Autism Pathway Framework. Zenodo. 2026.
DOI: 10.5281/zenodo.18403844

License

MIT License — see LICENSE for details.

Contact

Rohit Chauhan

Email: info@topmist.com
GitHub: @topmist-admin

RESEARCH USE ONLY — This framework is for hypothesis generation. Not for clinical diagnosis or treatment decisions.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.3

Apr 19, 2026

0.6.2

Apr 19, 2026

0.5.0

Apr 13, 2026

0.4.0

Mar 4, 2026

0.3.1

Feb 23, 2026

0.3.0

Feb 17, 2026

0.2.3

Feb 14, 2026

0.2.2

Feb 9, 2026

0.2.1 yanked

Feb 9, 2026

Reason this release was yanked:

"Breaks Colab/NumPy 2.x — use 0.2.2"

This version

0.2.0 yanked

Feb 9, 2026

Reason this release was yanked:

"Breaks Colab/NumPy 2.x — use 0.2.2"

0.1.0

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pathway_subtyping-0.2.0.tar.gz (191.6 kB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pathway_subtyping-0.2.0-py3-none-any.whl (81.2 kB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file pathway_subtyping-0.2.0.tar.gz.

File metadata

Download URL: pathway_subtyping-0.2.0.tar.gz
Upload date: Feb 9, 2026
Size: 191.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for pathway_subtyping-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8bfdb2f2bbed513717c52b9f5f80c350dbc4685905d984d75d69a3b01e008cd1`
MD5	`79b1045f02a1e031e8ac583b8d0da564`
BLAKE2b-256	`190a5b6207749fa5d3b74bf6bab0c4377fb2eba1ebda534cbe2022a05c94e005`

See more details on using hashes here.

File details

Details for the file pathway_subtyping-0.2.0-py3-none-any.whl.

File metadata

Download URL: pathway_subtyping-0.2.0-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 81.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for pathway_subtyping-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9409cc98406541239eb6cbe0ffa154620b2675e3cded27b609faeafca36151b7`
MD5	`7de3840b3f4702f8ae5cf9db674903d8`
BLAKE2b-256	`dfba413f1d34b07ea66ca0f3528284dd1299201ffc5608a519eada066fefd257`

See more details on using hashes here.

pathway-subtyping 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pathway Subtyping Framework

Overview

Supported Disease Areas

Key Features

Quick Start

Installation

Run with Sample Data

Run with Your Data

Try in Browser (No Installation)

Docker

Adapting for Your Disease

How It Works

1. Pathway Scoring

2. Subtype Discovery

3. Validation Gates

4. Statistical Rigor

Data Requirements

Project Structure

Development

Related Projects

Contributing

Citation

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes