Skip to main content

aProfiler: MSA Statistics & Visualization Toolkit

Project description

aProfiler Logo

aProfiler

MSA Statistics & Visualization Toolkit

aProfiler examines Multiple Sequence Alignments (MSAs) and delivers useful statistics, publication-grade plots, codon-aware metrics (RSCU and essential amino acid summaries), embeddings, and CSV tables from a single command-line call.

PyPI version

Python Versions License


Installation

python -m pip install --index-url https://test.pypi.org/simple --extra-index-url https://pypi.org/simple aprofiler

CLI Usage

aprofiler --input {alignment.fasta} --mode auto --report

Examples on test data

aprofiler --input .\data\test-TP53-nt.fasta
aprofiler --input .\data\test-TP53-nt.fasta --report
aprofiler --input .\data\test-TP53-nt.fasta --report --mode codon
aprofiler --input .\data\test-TP53-aa.fasta --report --report-format md
aprofiler --input .\data\test-TP53-aa.fasta --no-plots --mode aa

Modes

Mode Description
nt Nucleotide MSA (DNA/RNA, IUPAC tolerated)
aa Amino acid MSA (protein residues)
codon Coding sequence MSA analyzed at codon level (standard genetic code by default)
auto Auto-detect between NT and AA (codon is never auto-selected and must be explicit)

Common Flags

Flag Purpose
--input Input MSA file (FASTA MSA, A3M, or fixed-column alignment)
--mode nt, aa, codon, or auto
--report Generate a summary report (.md or .html)
--report-format md or html (default: html)
--no-plots Skip plots, output CSV tables only
--seed Set seed for PCA/UMAP reproducibility

Outputs (Saved Automatically)

All results are saved under:

./results/{alignment_name}/

CSV Tables

Output File
Global NT or AA frequencies *_global_freqs.csv
Per-site NT stats + entropy + GC% *_nt_per_site.csv
PCA embeddings *_pca_embedding.csv
UMAP embeddings *_umap_embedding.csv
Codon usage table *_codon_global.csv
Relative Synonymous Codon Usage (RSCU) *_codon_rscu.csv
Amino acid usage derived from codons *_aa_from_codons.csv
Essential vs non-essential AA summary (from codons) *_aa_essential_summary.csv

Plots (On by default unless disabled)

Plot Purpose
Nucleotide logo plot Position-wise base enrichment
GC% per-site GC landscape across MSA
Entropy per-site Conservation skyline
AA/NT per-site heatmaps Residue/base prevalence
PCA scatter Sequence-space clustering
UMAP scatter Similarity-space embedding
Pairwise identity histogram Sequence similarity distribution
Gap fraction histogram Alignment completeness QC
Codon usage barplot Most frequent codons
RSCU heatmap Synonymous codon bias by AA
Essential AA barplot Essential vs non-essential AA trends (codon mode only)

Alignment Format Compatibility and Constraints

Input alignment type Supported? Notes
FASTA MSA Yes Sequences must be equal length
A3M Yes Lowercase letters denote inserted columns
ALN/Clustal Yes Must be in fixed columns or converted first
Codon FASTA Yes Requires explicit --mode codon
Mixed NT+AA alphabets No Alphabet must be uniform per file

All alignments are treated as fixed, rectangular matrices; sequences must have equal alignment length.


Output Guarantees

All profiling artifacts are written into results/ without overwriting unrelated files. Ambiguous input characters are tolerated but tracked, not silently discarded. Codon mode metrics are only computed when explicitly requested.


Example Output Directory Tree

results/
  TP53_alignment/
    TP53_global_freqs.csv
    TP53_nt_per_site.csv
    TP53_pca_embedding.csv
    TP53_umap_embedding.csv
    TP53_report.html
    TP53_nt_logo.png
    TP53_entropy.png
    TP53_gc.png
    TP53_pca.png
    TP53_umap.png
    TP53_pairwise_identity.png
    TP53_gap_fraction.png
    TP53_rscu_heatmap.png
    TP53_aa_essential_bar.png

Scalability Notes

*Optimized for MSAs up to ~20k sequences × 10k columns on standard hardware. Larger inputs may require downsampling for logos/heatmaps


Testing

pip install pytest
pytest -q

Tests validate:

  • equal-length enforcement
  • stable fallback embedding behavior
  • non-empty CSV outputs
  • plot and artifact creation without crashes

Python API Example

from aprofiler.profiler import AlignmentProfiler

prof = AlignmentProfiler("alignment.fasta", mode="auto", out_dir="results")
prof.load_alignment()
outputs = prof.run_full_profile()
report_path = prof.generate_report(outputs, fmt="md")

print("Outputs generated:", outputs)
print("Report saved to:", report_path)

Citation

If you use aProfiler in a publication, please cite:

TBD

Contributing

Issues, discussions, and pull requests are welcome. Ensure contributions are:

  • statistically useful
  • plot-rich by design
  • free of silent failures
  • non-destructive to unrelated files
  • aligned with package philosophy and constraints

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aprofiler-0.1.2.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aprofiler-0.1.2-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file aprofiler-0.1.2.tar.gz.

File metadata

  • Download URL: aprofiler-0.1.2.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aprofiler-0.1.2.tar.gz
Algorithm Hash digest
SHA256 83b1e60e63fbe32835adb07bf6b866c66862ca3d72d26ed1d0f59aaec12091e9
MD5 00f0d87f1002daaa1b9c82b458d5e8fc
BLAKE2b-256 95be405fb4faed022f545cf43906820b91506a9e5ef9772055d5720cd585cba1

See more details on using hashes here.

File details

Details for the file aprofiler-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: aprofiler-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 17.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for aprofiler-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2f52705cea9cbd9e3fcd25df016ecb50cc8a159e2ac8375d433f3a435cccdb0a
MD5 449bb58d35baeba9f23a11f4017fe4a7
BLAKE2b-256 6a1901da0beb572eaa3a5efa27dd152237c4a52caed75bb32d93cf772c42f906

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page