Skip to main content

aProfiler: MSA Statistics & Visualization Toolkit

Project description

aProfiler Logo

aProfiler

MSA Statistics & Visualization Toolkit

aProfiler examines Multiple Sequence Alignments (MSAs) and delivers useful statistics, publication-grade plots, codon-aware metrics (RSCU and essential amino acid summaries), embeddings, and CSV tables from a single command-line call.

PyPI version

Python Versions License


Installation

pip install aprofiler

CLI Usage

aprofiler --input {alignment.fasta} --mode auto --report

CLI Help

aprofiler --help

usage: aprofiler [-h] --input INPUT [--mode {nt,aa,codon,auto}] [--no-plots]
                 [--report] [--report-format {md,html}] [--seed SEED]

aProfiler  MSA statistics & visualization.

options:
  -h, --help            show this help message and exit
  --input INPUT         Input MSA file (FASTA/A3M/ALN/etc.)
  --mode {nt,aa,codon,auto}
                        Sequence type mode (nt, aa, codon, or auto-detect).
  --no-plots            Disable plot generation (CSV only).
  --report              Generate a summary report (HTML by default).
  --report-format {md,html}
                        Report format if --report is set (default: html).
  --seed SEED           Random seed for any subsampling and UMAP embeddings.

Examples on test data

aprofiler --input .\data\test-TP53-nt.fasta
aprofiler --input .\data\test-TP53-nt.fasta --report
aprofiler --input .\data\test-TP53-nt.fasta --report --mode codon
aprofiler --input .\data\test-TP53-aa.fasta --report --report-format md
aprofiler --input .\data\test-TP53-aa.fasta --no-plots --mode aa

Modes

Mode Description
nt Nucleotide MSA (DNA/RNA, IUPAC tolerated)
aa Amino acid MSA (protein residues)
codon Coding sequence MSA analyzed at codon level (standard genetic code by default)
auto Auto-detect between NT and AA (codon is never auto-selected and must be explicit)

Common Flags

Flag Purpose
--input Input MSA file (FASTA MSA, A3M, or fixed-column alignment)
--mode nt, aa, codon, or auto
--report Generate a summary report (.md or .html)
--report-format md or html (default: html)
--no-plots Skip plots, output CSV tables only
--seed Set seed for PCA/UMAP reproducibility

Outputs (Saved Automatically)

All results are saved under:

./results/{alignment_name}/

CSV Tables

Output File
Global NT or AA frequencies *_global_freqs.csv
Per-site NT stats + entropy + GC% *_nt_per_site.csv
PCA embeddings *_pca_embedding.csv
UMAP embeddings *_umap_embedding.csv
Codon usage table *_codon_global.csv
Relative Synonymous Codon Usage (RSCU) *_codon_rscu.csv
Amino acid usage derived from codons *_aa_from_codons.csv
Essential vs non-essential AA summary (from codons) *_aa_essential_summary.csv

Plots (On by default unless disabled)

Plot Purpose
Nucleotide logo plot Position-wise base enrichment
GC% per-site GC landscape across MSA
Entropy per-site Conservation skyline
AA/NT per-site heatmaps Residue/base prevalence
PCA scatter Sequence-space clustering
UMAP scatter Similarity-space embedding
Pairwise identity histogram Sequence similarity distribution
Gap fraction histogram Alignment completeness QC
Codon usage barplot Most frequent codons
RSCU heatmap Synonymous codon bias by AA
Essential AA barplot Essential vs non-essential AA trends (codon mode only)

Alignment Format Compatibility and Constraints

Input alignment type Supported? Notes
FASTA MSA Yes Sequences must be equal length
A3M Yes Lowercase letters denote inserted columns
ALN/Clustal Yes Must be in fixed columns or converted first
Codon FASTA Yes Requires explicit --mode codon
Mixed NT+AA alphabets No Alphabet must be uniform per file

All alignments are treated as fixed, rectangular matrices; sequences must have equal alignment length.


Output Guarantees

All profiling artifacts are written into results/ without overwriting unrelated files. Ambiguous input characters are tolerated but tracked, not silently discarded. Codon mode metrics are only computed when explicitly requested.


Example Output Directory Tree

results/
  TP53_alignment/
    TP53_global_freqs.csv             # Tables
    TP53_nt_per_site.csv
    TP53_pca_embedding.csv
    TP53_umap_embedding.csv
    TP53_report.html                  # Report in HTML format
    TP53_nt_logo.png                  # Plots
    TP53_entropy.png                 
    TP53_gc.png
    TP53_pca.png
    TP53_umap.png
    TP53_pairwise_identity.png
    TP53_gap_fraction.png
    TP53_rscu_heatmap.png
    TP53_aa_essential_bar.png

Scalability Notes

*Optimized for MSAs up to ~20k sequences × 10k columns on standard hardware. Larger inputs may require downsampling for logos/heatmaps


Testing

pip install pytest
pytest -q

Tests validate:

  • equal-length enforcement
  • stable fallback embedding behavior
  • non-empty CSV outputs
  • plot and artifact creation without crashes

Python API Example

from aprofiler.profiler import AlignmentProfiler

prof = AlignmentProfiler("alignment.fasta", mode="auto", out_dir="results")
prof.load_alignment()
outputs = prof.run_full_profile()
report_path = prof.generate_report(outputs, fmt="md")

print("Outputs generated:", outputs)
print("Report saved to:", report_path)

Citation

If you use aProfiler in a publication, please cite:

TBD

Contributing

Issues, discussions, and pull requests are welcome. Ensure contributions are:

  • statistically useful
  • plot-rich by design
  • free of silent failures
  • non-destructive to unrelated files
  • aligned with package philosophy and constraints

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aprofiler-0.1.3.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aprofiler-0.1.3-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file aprofiler-0.1.3.tar.gz.

File metadata

  • Download URL: aprofiler-0.1.3.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aprofiler-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ac283e62dfe3b3aea7231984fbcd12cc1b2cd0c555f49d7c9e2375bb401dc385
MD5 abd62637dfb5017945caab92c2d91037
BLAKE2b-256 f648b5b39c35a0c4f32c51e7972c20fddfb66c197fe5b578dc45a2cfa1a7b7d1

See more details on using hashes here.

File details

Details for the file aprofiler-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: aprofiler-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aprofiler-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0d17793b44ebddb4d53b9f242a51d0f8162297e76fba5042a1015d0c17eef49e
MD5 852540cf41a50e48ec4d2215d46681c0
BLAKE2b-256 4c618f6d3bf357dec237b603254ea791059d6b546ea04c59317b21baca407aff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page