Skip to main content

ConSite: conserved-domain alignment and conserved-site visualization from protein FASTA

Project description

ConSite

ConSite takes a protein FASTA sequence, finds conserved domains via local Pfam/HMMER, aligns your sequence to each domain, scores per-position conservation, and renders publication-quality figures and structured outputs.

Features

  • FASTA → Pfam domain search (local HMMER)
  • Automatic alignment to each hit using the family's Pfam SEED HMM
  • Per-position conservation: entropy, Jensen–Shannon divergence (JSD), consensus frequency, coverage
  • Conserved site calling (top-X% by JSD, per domain)
  • Publication-quality visualization
    • Linear domain map with hollow-red conserved sites
    • MSA panels (from Pfam SEED): pyMSAviz colored MSA with optional conservation markers; legacy grayscale gradient renderer still available
    • Per-domain alignment panels for the query with optional conservation background scale
  • Reproducible outputs (JSON, TSV, PNG, Stockholm) and clear CLI logging

Installation

Option A: PyPI (recommended)

python -m pip install consite

Prerequisites

  • Python ≥ 3.10
  • HMMER 3.x in your PATH
  • Pfam database files (see Quick Start)

Install HMMER:

  • macOS (Homebrew): brew install hmmer
  • Linux (APT): sudo apt-get update && sudo apt-get install hmmer
  • Windows (conda): conda install -c conda-forge hmmer

Verify: hmmsearch --version

Quick Start

1) Get the Pfam database

Use the helper script:

chmod +x scripts/*.sh
./scripts/get_pfam.sh

This downloads Pfam-A.hmm and Pfam-A.seed, uncompresses, and runs hmmpress.

(Manual alternative is in "Manual Setup" below.)

2) Run ConSite (demo)

consite \
  --fasta examples/P05362.fasta \
  --pfam-hmm pfam_db/Pfam-A.hmm \
  --pfam-seed pfam_db/Pfam-A.seed \
  --out results \
  --id P05362

Manual Setup (from source)

git clone https://github.com/yangli-evo/ConSite.git
cd ConSite
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate
pip install -e .

# (then run ./scripts/get_pfam.sh, or manually download + hmmpress)

Manual Pfam download:

mkdir -p pfam_db
curl -L -o pfam_db/Pfam-A.hmm.gz https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
curl -L -o pfam_db/Pfam-A.seed.gz https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
gunzip -f pfam_db/Pfam-A.hmm.gz pfam_db/Pfam-A.seed.gz
hmmpress pfam_db/Pfam-A.hmm

Usage

Basic:

consite \
  --fasta myprotein.fasta \
  --pfam-hmm pfam_db/Pfam-A.hmm \
  --pfam-seed pfam_db/Pfam-A.seed \
  --out results \
  --topn 5 \
  --cpu 8 \
  --jsd-top-percent 15 \
  --log results/run.log

MSA panel tuned example (SEED gradient = JSD, labels with species+id, include query row, safe brightness):

consite \
  --fasta examples/GS2.fasta \
  --pfam-hmm pfam_db/Pfam-A.hmm \
  --pfam-seed pfam_db/Pfam-A.seed \
  --out results \
  --id GS2 \
  --msa-panel-nseq 8 \
  --msa-panel-metric jsd \
  --msa-labels species+id \
  --msa-include-query \
  --msa-min-brightness 0.28 \
  --panel-min-brightness 0.22

Outputs

For each run you'll get a folder results/<id>/ containing:

  • query.fasta – input sequence
  • hits.json – Pfam hits (family, coords, scores)
  • scores.tsv – per-position tracks (columns: pos, in_domain, jsd, entropy, is_conserved)
  • domain_map.png – linear domain map with conserved sites
  • *_panel.png – per-domain query panels with conserved sites (hollow red)
  • *_msa.png – Pfam SEED MSA panels (pyMSAviz colored by default, legacy grayscale gradient optional)
  • *_sim.png – pairwise % identity heatmap among panel sequences (RF-masked columns)
  • *_sim.tsv – pairwise % identity matrix (TSV)
  • *_aligned.sto – Stockholm alignment of query to each family HMM
  • hmmsearch.domtblout – raw HMMER domain table
  • run.log – external tool logs

Command-line Options

Option Description Default
--fasta input protein FASTA Required
--pfam-hmm Path to Pfam-A.hmm (pressed) Required
--pfam-seed Path to Pfam-A.seed Required
--out Output directory Required
--id Custom run ID (subfolder name) FASTA header
--topn Number of top Pfam hits to analyze 2
--cpu HMMER threads 4
--jsd-top-percent Top % (by JSD) called conserved within each domain 10.0
--log Append external tool output to this file results/<id>/run.log
--quiet Suppress console output False
--keep Keep existing output folder (don't overwrite) False

MSA panel (SEED) controls

Option Description Default
--msa-viz Renderer for MSA panels: pymsaviz or gradient pymsaviz
--msa-color-scheme pyMSAviz color scheme name Identity
--msa-wrap-length pyMSAviz wrap length (0 = no wrap) 80
--msa-show-grid Show grid lines in pyMSAviz off
--msa-show-count Show per-row non-gap counts in pyMSAviz off
--msa-show-consensus Show consensus identity bar in pyMSAviz off
--msa-sort Sort rows by NJ tree in pyMSAviz off
--msa-marker-top-percent Mark top X% conserved columns (pyMSAviz) 10.0
--msa-dpi DPI for MSA panel PNGs 200
--msa-panel-nseq Rows to display from the SEED alignment 8
--msa-panel-metric Gradient metric: entropy → uses 1-entropy, or jsd entropy
--msa-labels Row labels: id, species, species+id species+id
--msa-include-query Prepend query row to the MSA panel off
--msa-min-brightness Floor for background brightness (keeps letters legible) 0.26
--msa-min-coverage Mask columns below this coverage in panels (0–1) 0.30
--cons-weight-coverage Weight conservation by coverage (alpha) 1.0
--mask-inserts Use RF to mask insert columns True
--gap-glyph Gap rendering in MSA: dash, dot, or none dash
--gap-cell-brightness Brightness used in gap cells 0.90
--write-sim-matrix / --no-write-sim-matrix Write pairwise % identity matrices for MSA panels True

Per-domain panel (query) controls

Option Description Default
--panel-min-brightness Brightness floor for query panels 0.18

How It Works

  1. Domain detection: hmmsearch against Pfam-A HMM library (GA thresholds).
  2. SEED extraction: the matched family's block is pulled from Pfam-A.seed.
  3. Model building: hmmbuild produces a family HMM.
  4. Query alignment: hmmalign aligns your sequence to the family model.
  5. Scoring:
    • MSA panels use SEED-based per-column scores (JSD or 1-entropy), with optional coverage masking/weighting.
    • Conserved sites are called on the query alignment per domain (top-X% by JSD).
  6. Visualization: domain map, query panels, and SEED MSA panels (pyMSAviz or legacy gradient).

Example (ICAM1)

For examples/P05362.fasta, ConSite typically finds:

  • PF03921 (~25–115): Ig-like domain
  • PF21146 (~219–308): Ig-like domain

You'll see two *_panel.png, two *_msa.png, a domain_map.png, plus hits.json and scores.tsv.

Troubleshooting

  • command not found: hmmsearch Install HMMER and ensure it's on your PATH ( which hmmsearch ).

  • No such file or directory: pfam_db/Pfam-A.hmm Run ./scripts/get_pfam.sh or download manually and hmmpress.

  • Large/verbose logs Use --quiet and/or inspect run.log.

Development

ConSite/
├── src/consite/
│   ├── cli.py            # CLI   ├── hmmer_local.py    # HMMER wrappers   ├── parse_domtbl.py   # HMMER output parsing   ├── pfam.py           # Pfam SEED extraction   ├── msa_io.py         # Stockholm I/O (+RF)   ├── conserve.py       # Scoring (entropy/JSD/coverage)   └── viz.py            # Plots (domain map, panels, MSA panels)
├── scripts/              # Helper scripts (get_pfam, quickstart)
├── examples/             # Example FASTA files
└── pfam_db/              # Pfam files (user-provided)

Dependencies: biopython ≥1.81, numpy ≥2.0, pandas ≥2.0, scipy ≥1.16, matplotlib ≥3.7, Python ≥3.10.

Notes & Roadmap

  • Remote CDD mode is stubbed (local Pfam/HMMER path is fully supported).
  • Current conserved-site calls are relative (top-X%); absolute thresholds are planned.

Citation

Joey Wagner, Yang Li. ConSite: conserved-domain alignment and conserved-site visualization from protein FASTA.

License

MIT (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

consite-0.1.6.tar.gz (33.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

consite-0.1.6-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file consite-0.1.6.tar.gz.

File metadata

  • Download URL: consite-0.1.6.tar.gz
  • Upload date:
  • Size: 33.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for consite-0.1.6.tar.gz
Algorithm Hash digest
SHA256 07ee53f713140eb587f4303535d00ca217cacbcdde8d42defd17c9275d8431ae
MD5 a876157a2d0778fe2606733a0a65833c
BLAKE2b-256 0ebd839760b499fa0b4c1660724ba22dbcb93a1eda106fbac0f9999953d6d716

See more details on using hashes here.

Provenance

The following attestation bundles were made for consite-0.1.6.tar.gz:

Publisher: publish.yml on liyang-lab/ConSite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file consite-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: consite-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for consite-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 cece0f0dae8e0dbbd840e15f86ba8945e5b713a6c9e2c900e66c85d3cd7d6278
MD5 23aa6bf80ca0a9a300aa81caec581486
BLAKE2b-256 4a1d47f6db7d0b1b07e34116703462f0ae12d5a352c451bd50ca798617da7954

See more details on using hashes here.

Provenance

The following attestation bundles were made for consite-0.1.6-py3-none-any.whl:

Publisher: publish.yml on liyang-lab/ConSite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page