PaRMMoSaHN - Pangenome Reference-based Metabolic Modelling by Saving Homolog Networks

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lucasdv robbedewin

These details have not been verified by PyPI

Project description

PaRMMoSaHN

Pangenome Reference-based Metabolic Modelling by Saving Homolog Networks

Overview

PaRMMoSaHN is a Python-based pipeline that bridges partitioned pangenome graphs with high-fidelity metabolic reconstruction.

Instead of building genome-scale metabolic models (GEMs) for hundreds of strains from scratch -- which is computationally expensive and produces inconsistently gap-filled models -- PaRMMoSaHN builds a single high-quality pan-model from the pangenome and rapidly derives strain-specific models by mapping individual strains to this reference via homology networks.

Why this approach?

Existing tools either reconstruct each strain independently (gapseq, CarveMe) or require a manually curated species-specific reference (Bactabolize, KpSC pan). PaRMMoSaHN is the first species-agnostic, automated pipeline that leverages PPanGGOLiN pangenome partitioning to build a shared metabolic reference, then projects it onto strains using DIAMOND sequence homology. This produces models that are:

Consistent -- all strains are derived from the same metabolic reference, eliminating artificial variation from independent gap-filling
Fast -- strain projection takes minutes instead of hours per genome
Scalable -- .done checkpointing and isolated evaluation processes handle hundreds of strains

Pipeline

   Genomes (.gbff)
        |
        v
  [Step 1] PPanGGOLiN pangenome construction
        |   -> soft-core protein families FASTA
        v
  [Step 2] gapseq metabolic pathway prediction on pangenome
        |   -> pan-model reaction table
        v
  [Step 3] Per-strain model derivation
        |   DIAMOND blastp -> filter pan-model -> gapseq draft + gapfill
        |   -> one SBML model per strain
        v
  [Step 4] Automated curation (memote evaluation + duplicate/imbalance fixes)
        |   Optional: pause for manual curation via Excel spreadsheet
        v
  [Step 5b] ModelPolisher (optional side-step, --polish; Docker/Podman/Apptainer)
        |
  [Step 5] Annotation enrichment (NCBI protein, BRENDA, RHEA)
        |
        v
  [Step 6] Gather & convert (SBML, JSON, MATLAB)
        |
        v
  [Step 7] Final memote FAIR-compliance evaluation

Output structure

output/
├── 01-pangenome/                       # PPanGGOLiN pangenome + soft-core FASTA
│   └── pangenome_meta.json             # n_genomes, soft-core threshold, cluster params
├── 02-panmodel/                        # gapseq reaction/pathway tables
├── 03-strain_models/                   # Per-strain DIAMOND matches, proteomes, draft models
├── 04-curated_models/                  # Models after automated curation
│   └── curation_application_report.tsv # Per-row hit counts across strains (v0.2.1)
├── 05-annotated_models/                # SBML enriched with NCBI/BRENDA/RHEA annotations (Step 5)
├── 05b-polished_models/                # ModelPolisher output (optional side-step, --polish)
├── 06-final_models/                    # Final models in XML, JSON, and MATLAB formats
├── 07-memote_reports/                  # draft/ and final/ memote HTML+JSON reports
├── curation_template.xlsx              # Memote-derived spreadsheet for manual curation
├── pipeline_summary.json               # Run metadata, parameters, provenance, model statistics
├── run.log                             # Full INFO-level run log
└── errors.log                          # WARNING+ messages (only if errors occurred)

The pipeline_summary.json file includes a provenance block (SHA-256 of the medium CSV and curation database, soft-core threshold actually used, external tool versions) and a run_environment block (host CPU/RAM/OS), so a reviewer can verify a run is reproducible without re-running anything.

Installation

PaRMMoSaHN orchestrates several bioinformatics tools that cannot be installed via pip alone. Use Conda/Mamba to set up the environment.

1. Create the Conda environment

# mamba is recommended for faster dependency resolution
mamba env create -f environment.yml
conda activate parmmosahn_env

2. Install PaRMMoSaHN

From PyPI (recommended):

pip install parmmosahn

This installs the Python orchestrator and its Python dependencies only. The external tools (PPanGGOLiN, gapseq, DIAMOND) come from the Conda environment in step 1 — verify them with parmmosahn doctor.

For development, from a clone:

pip install -e ".[dev]"

Or directly from GitHub:

pip install "git+https://github.com/robbedewin/PaRMMoSaHN.git"

3. Verify the installation

parmmosahn doctor

This checks that all required external tools (PPanGGOLiN, gapseq, DIAMOND) and optional container engines (Docker, Podman, Apptainer) are available, and reports host CPU/RAM with a recommended memote-worker count for the evaluate step (helpful on memory-constrained hosts such as default WSL2 installations, where the default worker count can trigger BrokenProcessPool errors).

Quick Start

Workflow A: Full automated pipeline

parmmosahn run \
  -g /path/to/genomes/ \
  -o ./results/ \
  -l clostridiales \
  -m medium.csv \
  -t 14 --parallel-strains 2

Required inputs:

Option	Description
`-g, --genomes`	Directory containing annotated genomes in GenBank format (`.gbff`)
`-o, --output`	Output base directory
`-l, --label`	Label for the pangenome (used in filenames)
`-m, --medium`	Growth medium CSV for gap-filling (gapseq format)

Optional parameters:

Option	Default	Description
`-t, --threads`	75% of CPUs	Total CPU threads
`--parallel-strains`	auto (`threads // 4`)	Strains processed in parallel in Step 3; each worker gets `threads / N` CPU threads (peaks ~1.5 GB RAM each)
`--diamond-bits`	150	DIAMOND blastp bitscore threshold
`--gapseq-bits`	150	gapseq pathway search bitscore threshold
`--biomass`	`pos`	Biomass reaction type (`pos` or `neg`, matches gapseq's gram-stain templates)
`--add-unique`	off	Include unique/cloud genes (singletons) in the pan-model
`--soft-core`	2/N	Override the soft-core frequency threshold (fraction in 0–1)
`--polish`	off	Enable ModelPolisher (requires Docker/Podman/Apptainer; see ModelPolisher below)
`-e, --engine`	`docker`	Container engine when `--polish` is enabled
`-c, --curation-db`	none	Path to an existing curation spreadsheet (skips auto-template generation)
`--pause-for-curation`	off	Pause after Step 3 for manual curation; resume with `parmmosahn project --resume`

Workflow B: Human-in-the-loop curation

For maximum model quality, pause the pipeline after draft model evaluation, manually review the curation spreadsheet, then resume:

# Step 1: Run pipeline and pause for manual curation
parmmosahn run \
  -g ./genomes/ -o ./results/ -l my_species -m medium.csv \
  --pause-for-curation

# -> Edit results/curation_template.xlsx in Excel
#    Fill the 'duplicate_reactions' and 'curated_imbalances' sheets

# Step 2: Resume pipeline with your curation decisions
parmmosahn project --resume -o ./results/

The curation template has three sheets:

duplicate_reactions -- pairs of duplicate reactions with decision options (keep 1, keep 2, drop both, keep both)
curated_imbalances -- mass/charge-imbalanced reactions with a column for corrected formulas
ignored_imbalances -- reactions to leave intact despite imbalance (with justification)

Workflow C: Rapid projection of new isolates

If you already have a pan-model and sequenced new strains, bypass the pangenome construction:

parmmosahn project \
  -g ./new_strains/ \
  -f ./results/01-pangenome/my_species.faa \
  -r ./results/02-panmodel/my_species-all-Reactions.tbl \
  -m medium.csv \
  -o ./projection/

Analysis

Once you have strain-specific models, PaRMMoSaHN provides built-in analysis commands under parmmosahn analyze to explore metabolic diversity, validate predictions, and compare strains. These analyses are what make the models scientifically useful -- raw SBML files only become insights when you interrogate them.

Pan-reactome characterization

The pan-reactome is the union of all metabolic reactions across your strains, analogous to the pangenome but at the metabolic level. Characterizing it reveals which metabolic capabilities are universally conserved (core), which are shared by subsets of strains (accessory), and which are strain-specific (unique). This is the central scientific output of a pangenome-scale metabolic study.

parmmosahn analyze panreactome \
  -M ./results/06-final_models/ \
  -o ./analysis/panreactome.tsv \
  --plot

Outputs:

panreactome.tsv -- per-reaction classification (core/accessory/unique) with strain presence
panreactome_summary.tsv -- pan-reactome size, core/accessory/unique counts, model size statistics
panreactome_jaccard.tsv -- pairwise Jaccard similarity matrix between strains
panreactome_accumulation.tsv -- reaction accumulation curve (pan-reactome growth with added genomes)
panreactome_dendrogram.png -- hierarchical clustering of strains by metabolic similarity (with --plot)

The accumulation curve shows whether the pan-reactome is "open" (still growing) or "closed" (saturated), which has implications for how representative your strain collection is.

By default a reaction is classified core if it occurs in more than 99% of strains and unique (cloud) if it occurs in fewer than 5%; adjust these cutoffs with --core-threshold and --cloud-threshold.

Phenotype validation

Model predictions are only as trustworthy as their agreement with experimental data. The validate command compares in silico growth predictions against experimental growth phenotypes (e.g., Biolog plates, carbon source utilization assays) and computes standard classification metrics.

parmmosahn analyze validate \
  -M ./results/06-final_models/ \
  -p phenotypes.csv \
  -o ./analysis/validation.tsv

The phenotype file can be in matrix format (rows = carbon sources, columns = strains, values = 1/0) or long format (columns: strain, carbon_source, growth). Carbon sources should be specified as exchange reaction IDs (e.g., EX_glc__D_e0).

Output metrics: accuracy, sensitivity, specificity, and Matthews Correlation Coefficient (MCC). Per-strain per-substrate results are written to the TSV for detailed inspection.

FBA summary

Run Flux Balance Analysis on all models to compare baseline growth rates and active exchange reactions:

parmmosahn analyze fba \
  -M ./results/06-final_models/ \
  -o ./analysis/fba_summary.tsv \
  -m medium.csv

Auxotrophy screening

Systematically knock out each medium component to identify predicted auxotrophies -- strains that cannot grow without a specific nutrient. This reveals metabolic dependencies that may reflect ecological niche adaptation or gene loss events.

parmmosahn analyze auxotrophy \
  -M ./results/06-final_models/ \
  -o ./analysis/auxotrophy.tsv \
  -m medium.csv

Gene essentiality prediction

Perform single-gene deletion FBA to predict which genes are essential for growth. This can be validated against experimental transposon library (Tn-seq) data and highlights potential drug targets in pathogens.

parmmosahn analyze essentiality \
  -M ./results/06-final_models/ \
  -o ./analysis/essentiality.tsv

Reaction heatmap

Build a binary presence/absence matrix of all reactions across strains, optionally with a clustered heatmap visualization:

parmmosahn analyze heatmap \
  -M ./results/06-final_models/ \
  -o ./analysis/heatmap.tsv \
  --plot

Prune dry-run (diagnostic)

Read-only diagnostic — modifies no models. This reports what could be pruned; the destructive prune step it scaffolds is still on the roadmap.

Report dead-end metabolites (those with zero producers or zero consumers) and reactions whose participants are all dead-end, per model. Dead-end metabolites are one structural source of the cohort-wide blocked-reaction baseline, so this gives a quick, non-destructive estimate of how many reactions are unambiguously prunable before committing to a clean-up pass.

parmmosahn analyze prune-report \
  -M ./results/06-final_models/ \
  -o ./analysis/prune_report.tsv

Output: one row per model with dead-end metabolite counts (dead_end_metabolites, dead_end_fraction), unambiguously-prunable reaction counts (reactions_all_dead_end, prunable_fraction), plus an FBA growth_rate_baseline and fba_status so you can confirm the models still grow.

Modular usage

Each pipeline step can be run independently:

parmmosahn pan -g ./genomes/ -l my_label -t 8     # Steps 1-2 only
parmmosahn strains -g ./genomes/ -f soft.faa \
  -r rxns.tbl -m medium.csv                        # Step 3 only
parmmosahn evaluate -M ./models/ -o ./reports/     # Memote evaluation
parmmosahn annotate -M ./models/ -o ./annotated/   # FAIR annotation
parmmosahn gather -M ./models/ -o ./final/         # Format conversion
parmmosahn curate -s model.xml -o out/ -d curation.xlsx  # Apply curations
parmmosahn polish -M ./models/ -o ./polished/      # ModelPolisher

Run parmmosahn --help for a full list of commands and options.

Configuration

PaRMMoSaHN supports YAML configuration files as an alternative to CLI flags for the run, pan, and project commands. Generate a template:

parmmosahn init-config -o my_config.yml

The generated template documents every config-loadable key. Then use it:

# Full pipeline from a config file (all required args may come from config)
parmmosahn run --config my_config.yml

# Or mix CLI overrides with config defaults
parmmosahn run --config my_config.yml -o ./results/ -t 24

CLI arguments always override config file values. All options can also be set via environment variables prefixed with PARMMOSAHN_ (e.g., PARMMOSAHN_THREADS=32).

ModelPolisher

ModelPolisher (v2.1-beta) enriches SBML models with BiGG database annotations and standardised identifiers. It runs inside a container, so it requires Docker, Podman, or Apptainer on the host.

ModelPolisher is OFF by default, both because of the container dependency and because the bundled beta version of ModelPolisher uses a fragile network fetch at startup. Enable it explicitly with --polish, optionally picking an engine with -e:

# Default container engine (docker)
parmmosahn run ... --polish

# Podman (rootless, HPC-friendly)
parmmosahn run ... --polish -e podman

# Apptainer/Singularity (HPC clusters)
parmmosahn run ... --polish -e apptainer

If no container engine is detected at runtime, Step 5b is silently skipped and the pipeline continues with the pre-polish models.

Note: ModelPolisher v2.1 (stable release) has a known bug with a broken URL pattern regex in the DataONE namespace and crashes at startup. PaRMMoSaHN bundles the v2.1-beta which works correctly. SBML headers are temporarily downgraded from L3V2 to L3V1 for beta compatibility.

Known Limitations

Reaction pre-filtering does not evaluate GPR rules. If a reaction requires multiple gene subunits (AND rule), it may be included even if only one subunit has a homolog. The downstream gapseq draft step partially mitigates this.
Single medium for all strains. Gap-filling uses one medium specification. Strains from different niches may need different media. Use parmmosahn project to re-derive models with alternative media.
Generic biomass composition. Biomass reactions use gapseq's default gram-positive or gram-negative templates rather than species-specific composition. This is consistent with other automated tools (CarveMe, Bactabolize).
Soft-core threshold (2/N). The default threshold excludes genes present in only one genome. For very small collections (N < 5), consider using --add-unique or adjusting --soft-core.

Citation

If you use PaRMMoSaHN in your research, please cite:

@software{dewin2026parmmosahn,
  author = {De Win, Robbe and De Vrieze, Lucas},
  title = {PaRMMoSaHN: Pangenome Reference-based Metabolic Modelling by Saving Homolog Networks},
  version = {0.3.0},
  year = {2026},
  url = {https://github.com/robbedewin/PaRMMoSaHN}
}

License

This project is licensed under the MIT License -- see the LICENSE file for details.

Acknowledgments

This work was developed as part of a Master's thesis project at KU Leuven in the laboratory of Prof. Masschelein, in collaboration with the VIB-KU Leuven Center for Microbiology.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

lucasdv robbedewin

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.0

May 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parmmosahn-0.3.0.tar.gz (325.9 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parmmosahn-0.3.0-py3-none-any.whl (285.2 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file parmmosahn-0.3.0.tar.gz.

File metadata

Download URL: parmmosahn-0.3.0.tar.gz
Upload date: May 26, 2026
Size: 325.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for parmmosahn-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`633c39b7b46995177875bae4a343ab8467f1f2470de70aea0dcdcf8ad485e29f`
MD5	`a54fbcaa77a3e4aaa447726428a2383a`
BLAKE2b-256	`543ae0043502c6da8c5fb9cbc55e7b4534dc78ea0042d4cdd7680e5943f85301`

See more details on using hashes here.

Provenance

The following attestation bundles were made for parmmosahn-0.3.0.tar.gz:

Publisher: ci.yml on robbedewin/PaRMMoSaHN

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: parmmosahn-0.3.0.tar.gz
- Subject digest: 633c39b7b46995177875bae4a343ab8467f1f2470de70aea0dcdcf8ad485e29f
- Sigstore transparency entry: 1634863523
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: robbedewin/PaRMMoSaHN@254977922a34f1925f52ae99fc99de6b851fbf5e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/robbedewin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@254977922a34f1925f52ae99fc99de6b851fbf5e
- Trigger Event: push

File details

Details for the file parmmosahn-0.3.0-py3-none-any.whl.

File metadata

Download URL: parmmosahn-0.3.0-py3-none-any.whl
Upload date: May 26, 2026
Size: 285.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for parmmosahn-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4970c9f8c21e48954e528da270f03fa4b9e0f5f62a61dc1046873fc0f1ec6bf1`
MD5	`09bd305e2503791e2a776acaa8c9fe2b`
BLAKE2b-256	`f5fe9ace9481407e85811e0ccc1bab8a6eb60cfb7d68ddf272635db9225a39af`

See more details on using hashes here.

Provenance

The following attestation bundles were made for parmmosahn-0.3.0-py3-none-any.whl:

Publisher: ci.yml on robbedewin/PaRMMoSaHN

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: parmmosahn-0.3.0-py3-none-any.whl
- Subject digest: 4970c9f8c21e48954e528da270f03fa4b9e0f5f62a61dc1046873fc0f1ec6bf1
- Sigstore transparency entry: 1634863611
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: robbedewin/PaRMMoSaHN@254977922a34f1925f52ae99fc99de6b851fbf5e
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/robbedewin
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@254977922a34f1925f52ae99fc99de6b851fbf5e
- Trigger Event: push

parmmosahn 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PaRMMoSaHN

Overview

Why this approach?

Pipeline

Output structure

Installation

1. Create the Conda environment

2. Install PaRMMoSaHN

3. Verify the installation

Quick Start

Workflow A: Full automated pipeline

Workflow B: Human-in-the-loop curation

Workflow C: Rapid projection of new isolates

Analysis

Pan-reactome characterization

Phenotype validation

FBA summary

Auxotrophy screening

Gene essentiality prediction

Reaction heatmap

Prune dry-run (diagnostic)

Modular usage

Configuration

ModelPolisher

Known Limitations

Citation

License

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance