Lossless evolutionary-aware multiple sequence alignment compressor
Project description
Docs · Report Bug · Request Feature
Evolution-informed lossless compression of multiple-sequence alignments (MSAs).
Installation
From PyPI (recommended for users):
pip install ecomp
From source:
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install .[dev]
Offline? Pre-install
biopython,numpy,bitarray, and the dev tools (pytest,ruff,black,mypy, …) inside your environment.
CLI Quickstart
All commands are exposed through the ecomp entry point.
# Compress an alignment (produces example.ecomp, optional JSON sidecar)
ecomp zip example.fasta --metadata example.json
# Decompress (writes FASTA by default)
ecomp unzip example.ecomp --alignment-output restored.fasta
# Inspect metadata (summary or JSON)
ecomp inspect example.ecomp --summary
# Diagnostics (Phykit-style aliases in parentheses)
ecomp consensus_sequence example.ecomp # con_seq
ecomp column_base_counts example.ecomp # col_counts
ecomp gap_fraction example.ecomp # gap_frac
ecomp shannon_entropy example.ecomp # entropy
ecomp parsimony_informative_sites example.ecomp # parsimony
ecomp constant_columns example.ecomp # const_cols
ecomp pairwise_identity example.ecomp # pid
ecomp alignment_length_excluding_gaps example.ecomp # len_no_gaps
ecomp alignment_compressed_length example.ecomp # compressed_len
ecomp variable_sites example.ecomp # var_sites
ecomp percentage_identity example.ecomp # pct_id
ecomp relative_composition_variability example.ecomp # rcv
Benchmarks mirror standard codec comparisons:
/usr/bin/time -p ecomp zip data/fixtures/small_phylo.fasta --output out.ecomp
/usr/bin/time -p gzip -k data/fixtures/small_phylo.fasta
/usr/bin/time -p bzip2 -k data/fixtures/small_phylo.fasta
Python API
Everything the CLI does is re-exported in ecomp.
from ecomp import zip, unzip, read_alignment, percentage_identity, column_base_counts
# File-based workflow
archive_path, metadata_path = zip(
"data/example.fasta",
metadata_path="data/example.json", # optional JSON copy
)
restored_path = unzip(archive_path, output_path="data/restored.fasta")
# Diagnostics on an AlignmentFrame
frame = read_alignment("data/example.fasta")
pct_identity = percentage_identity(frame)
base_counts = column_base_counts(frame)
print(f"Mean pairwise identity: {pct_identity:.2f}%")
print("Column 1 counts:", base_counts[0])
In-memory usage (no intermediate files):
from ecomp import AlignmentFrame, compress_alignment, decompress_alignment
frame = AlignmentFrame(
ids=["s1", "s2"],
sequences=["ACGT", "ACGA"],
alphabet=["A", "C", "G", "T"],
)
compressed = compress_alignment(frame)
restored = decompress_alignment(compressed.payload, compressed.metadata)
assert restored.sequences == frame.sequences
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ecomp-0.0.2.tar.gz.
File metadata
- Download URL: ecomp-0.0.2.tar.gz
- Upload date:
- Size: 33.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a96bf191da09f3ccabda6a2d772efc8e35b1bc59ddcd7463dfeb39734af2b94
|
|
| MD5 |
30108c2de77f84f34ac19a54c4127a0a
|
|
| BLAKE2b-256 |
83b1e8cf800b9d6de0bc663bb54fa726f42442f5affdedf2355d544da56f562c
|
File details
Details for the file ecomp-0.0.2-py3-none-any.whl.
File metadata
- Download URL: ecomp-0.0.2-py3-none-any.whl
- Upload date:
- Size: 35.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
83b756dd9be4ab3d180431e3c08b7ab3dce83b86e624dd7080f8542e1d40e544
|
|
| MD5 |
b6bbbb7257addfcbab58051ff30acdc9
|
|
| BLAKE2b-256 |
01904664044ab5e60c645d5d181f5be44ce806d12f051858954ee08dc1d489b7
|