ConSite: conserved-domain alignment and conserved-site visualization from protein FASTA
Project description
ConSite
ConSite takes a protein FASTA sequence, finds conserved domains via local Pfam/HMMER, aligns your sequence to each domain, scores per-position conservation, and renders publication-quality figures and structured outputs.
Features
- FASTA → Pfam domain search (local HMMER)
- Automatic alignment to each hit using the family's Pfam SEED HMM
- Per-position conservation: entropy, Jensen–Shannon divergence (JSD), consensus frequency, coverage
- Conserved site calling (top-X% by JSD, per domain)
- Publication-quality visualization
- Linear domain map with hollow-red conserved sites
- MSA gradient panels (from Pfam SEED): brightness-floored grayscale so letters remain legible; supports species labels, optional query row at top, gap glyphs (dash/dot/none), coverage masking/weighting
- Per-domain alignment panels for the query with optional conservation background scale
- Reproducible outputs (JSON, TSV, PNG, Stockholm) and clear CLI logging
Installation
Option A: PyPI (recommended)
python -m pip install consite
Prerequisites
- Python ≥ 3.10
- HMMER 3.x in your
PATH - Pfam database files (see Quick Start)
Install HMMER:
- macOS (Homebrew):
brew install hmmer - Linux (APT):
sudo apt-get update && sudo apt-get install hmmer - Windows (conda):
conda install -c conda-forge hmmer
Verify: hmmsearch --version
Quick Start
1) Get the Pfam database
Use the helper script:
chmod +x scripts/*.sh
./scripts/get_pfam.sh
This downloads Pfam-A.hmm and Pfam-A.seed, uncompresses, and runs hmmpress.
(Manual alternative is in "Manual Setup" below.)
2) Run ConSite (demo)
consite \
--fasta examples/P05362.fasta \
--pfam-hmm pfam_db/Pfam-A.hmm \
--pfam-seed pfam_db/Pfam-A.seed \
--out results \
--id P05362
Manual Setup (from source)
git clone https://github.com/yangli-evo/ConSite.git
cd ConSite
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .
# (then run ./scripts/get_pfam.sh, or manually download + hmmpress)
Manual Pfam download:
mkdir -p pfam_db
curl -L -o pfam_db/Pfam-A.hmm.gz https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
curl -L -o pfam_db/Pfam-A.seed.gz https://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
gunzip -f pfam_db/Pfam-A.hmm.gz pfam_db/Pfam-A.seed.gz
hmmpress pfam_db/Pfam-A.hmm
Usage
Basic:
consite \
--fasta myprotein.fasta \
--pfam-hmm pfam_db/Pfam-A.hmm \
--pfam-seed pfam_db/Pfam-A.seed \
--out results \
--topn 5 \
--cpu 8 \
--jsd-top-percent 15 \
--log results/run.log
MSA panel tuned example (SEED gradient = JSD, labels with species+id, include query row, safe brightness):
consite \
--fasta examples/GS2.fasta \
--pfam-hmm pfam_db/Pfam-A.hmm \
--pfam-seed pfam_db/Pfam-A.seed \
--out results \
--id GS2 \
--msa-panel-nseq 8 \
--msa-panel-metric jsd \
--msa-labels species+id \
--msa-include-query \
--msa-min-brightness 0.28 \
--panel-min-brightness 0.22
Outputs
For each run you'll get a folder results/<id>/ containing:
query.fasta– input sequencehits.json– Pfam hits (family, coords, scores)scores.tsv– per-position tracks (columns:pos,in_domain,jsd,entropy,is_conserved)domain_map.png– linear domain map with conserved sites*_panel.png– per-domain query panels with conserved sites (hollow red)*_msa.png– Pfam SEED MSA panels with grayscale conservation gradient, labels, and optional query row*_sim.png– pairwise % identity heatmap among panel sequences (RF-masked columns)*_sim.tsv– pairwise % identity matrix (TSV)*_aligned.sto– Stockholm alignment of query to each family HMMhmmsearch.domtblout– raw HMMER domain tablerun.log– external tool logs
Command-line Options
| Option | Description | Default |
|---|---|---|
--fasta |
input protein FASTA | Required |
--pfam-hmm |
Path to Pfam-A.hmm (pressed) |
Required |
--pfam-seed |
Path to Pfam-A.seed |
Required |
--out |
Output directory | Required |
--id |
Custom run ID (subfolder name) | FASTA header |
--topn |
Number of top Pfam hits to analyze | 2 |
--cpu |
HMMER threads | 4 |
--jsd-top-percent |
Top % (by JSD) called conserved within each domain | 10.0 |
--log |
Append external tool output to this file | results/<id>/run.log |
--quiet |
Suppress console output | False |
--keep |
Keep existing output folder (don't overwrite) | False |
MSA panel (SEED) controls
| Option | Description | Default |
|---|---|---|
--msa-panel-nseq |
Rows to display from the SEED alignment | 8 |
--msa-panel-metric |
Gradient metric: entropy → uses 1-entropy, or jsd |
entropy |
--msa-labels |
Row labels: id, species, species+id |
species+id |
--msa-include-query |
Prepend query row to the MSA panel | off |
--msa-min-brightness |
Floor for background brightness (keeps letters legible) | 0.26 |
--msa-min-coverage |
Mask columns below this coverage in panels (0–1) | 0.30 |
--cons-weight-coverage |
Weight conservation by coverage (alpha) | 1.0 |
--mask-inserts |
Use RF to mask insert columns | True |
--gap-glyph |
Gap rendering in MSA: dash, dot, or none |
dash |
--gap-cell-brightness |
Brightness used in gap cells | 0.90 |
--write-sim-matrix / --no-write-sim-matrix |
Write pairwise % identity matrices for MSA panels | True |
Per-domain panel (query) controls
| Option | Description | Default |
|---|---|---|
--panel-min-brightness |
Brightness floor for query panels | 0.18 |
How It Works
- Domain detection:
hmmsearchagainst Pfam-A HMM library (GA thresholds). - SEED extraction: the matched family's block is pulled from
Pfam-A.seed. - Model building:
hmmbuildproduces a family HMM. - Query alignment:
hmmalignaligns your sequence to the family model. - Scoring:
- MSA panels use SEED-based per-column scores (JSD or 1-entropy), with optional coverage masking/weighting.
- Conserved sites are called on the query alignment per domain (top-X% by JSD).
- Visualization: domain map, query panels, and SEED MSA gradient panels.
Example (ICAM1)
For examples/P05362.fasta, ConSite typically finds:
- PF03921 (~25–115): Ig-like domain
- PF21146 (~219–308): Ig-like domain
You'll see two *_panel.png, two *_msa.png, a domain_map.png, plus hits.json and scores.tsv.
Troubleshooting
-
command not found: hmmsearchInstall HMMER and ensure it's on yourPATH(which hmmsearch). -
No such file or directory: pfam_db/Pfam-A.hmmRun./scripts/get_pfam.shor download manually andhmmpress. -
Large/verbose logs Use
--quietand/or inspectrun.log.
Development
ConSite/
├── src/consite/
│ ├── cli.py # CLI
│ ├── hmmer_local.py # HMMER wrappers
│ ├── parse_domtbl.py # HMMER output parsing
│ ├── pfam.py # Pfam SEED extraction
│ ├── msa_io.py # Stockholm I/O (+RF)
│ ├── conserve.py # Scoring (entropy/JSD/coverage)
│ └── viz.py # Plots (domain map, panels, MSA gradient)
├── scripts/ # Helper scripts (get_pfam, quickstart)
├── examples/ # Example FASTA files
└── pfam_db/ # Pfam files (user-provided)
Dependencies: biopython ≥1.81, numpy ≥2.0, pandas ≥2.0, scipy ≥1.16, matplotlib ≥3.7, Python ≥3.10.
Notes & Roadmap
- Remote CDD mode is stubbed (local Pfam/HMMER path is fully supported).
- Current conserved-site calls are relative (top-X%); absolute thresholds are planned.
Citation
Joey Wagner, Yang Li. ConSite: conserved-domain alignment and conserved-site visualization from protein FASTA.
License
MIT (see LICENSE).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file consite-0.1.5.tar.gz.
File metadata
- Download URL: consite-0.1.5.tar.gz
- Upload date:
- Size: 31.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3859c67a4f73c44cbc6194fce5cb89f182069b26a4eb09f7774356aea6e5cb0c
|
|
| MD5 |
9ce6470aefeeee0d6097ac4a57365abd
|
|
| BLAKE2b-256 |
61991ddbbbd5556f800ea97888fe1ba4c5dd2c9a195b4dbb6e34709ac1e10fd9
|
Provenance
The following attestation bundles were made for consite-0.1.5.tar.gz:
Publisher:
publish.yml on liyang-lab/ConSite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
consite-0.1.5.tar.gz -
Subject digest:
3859c67a4f73c44cbc6194fce5cb89f182069b26a4eb09f7774356aea6e5cb0c - Sigstore transparency entry: 646688906
- Sigstore integration time:
-
Permalink:
liyang-lab/ConSite@5116224f8d3f7398cd4c6f67b65b11f90dd2530a -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/liyang-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5116224f8d3f7398cd4c6f67b65b11f90dd2530a -
Trigger Event:
push
-
Statement type:
File details
Details for the file consite-0.1.5-py3-none-any.whl.
File metadata
- Download URL: consite-0.1.5-py3-none-any.whl
- Upload date:
- Size: 31.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
378956bc20c5c70cfb74077539c5b515702535c8376a340d67ca702646facb36
|
|
| MD5 |
eadc23251d84a66c83d2565f2e41ebe1
|
|
| BLAKE2b-256 |
6b4ad0c8ada6a7d599f7f0d9e06d545328c1c10a1053038904435d32e5c89ba5
|
Provenance
The following attestation bundles were made for consite-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on liyang-lab/ConSite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
consite-0.1.5-py3-none-any.whl -
Subject digest:
378956bc20c5c70cfb74077539c5b515702535c8376a340d67ca702646facb36 - Sigstore transparency entry: 646688932
- Sigstore integration time:
-
Permalink:
liyang-lab/ConSite@5116224f8d3f7398cd4c6f67b65b11f90dd2530a -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/liyang-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5116224f8d3f7398cd4c6f67b65b11f90dd2530a -
Trigger Event:
push
-
Statement type: