Skip to main content

Lossless evolutionary-aware multiple sequence alignment compressor

Project description

Logo

Docs · Report Bug · Request Feature


Bioconda Downloads

Evolution-informed lossless compression of multiple-sequence alignments (MSAs).


Installation

From PyPI (recommended for users):

pip install ecomp

From source:

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install .[dev]

Offline? Pre-install biopython, numpy, bitarray, and the dev tools (pytest, ruff, black, mypy, …) inside your environment.


CLI Quickstart

All commands are exposed through the ecomp entry point.

# Compress an alignment (produces example.ecomp, optional JSON sidecar)
ecomp zip example.fasta --metadata example.json

# Decompress (writes FASTA by default)
ecomp unzip example.ecomp --alignment-output restored.fasta

# Inspect metadata (summary or JSON)
ecomp inspect example.ecomp --summary

# Diagnostics (Phykit-style aliases in parentheses)
ecomp consensus_sequence example.ecomp             # con_seq
ecomp column_base_counts example.ecomp             # col_counts
ecomp gap_fraction example.ecomp                   # gap_frac
ecomp shannon_entropy example.ecomp                # entropy
ecomp parsimony_informative_sites example.ecomp    # parsimony
ecomp constant_columns example.ecomp               # const_cols
ecomp pairwise_identity example.ecomp              # pid
ecomp alignment_length_excluding_gaps example.ecomp    # len_no_gaps
ecomp alignment_compressed_length example.ecomp        # compressed_len
ecomp variable_sites example.ecomp                     # var_sites
ecomp percentage_identity example.ecomp                # pct_id
ecomp relative_composition_variability example.ecomp   # rcv

Benchmarks mirror standard codec comparisons:

/usr/bin/time -p ecomp zip data/fixtures/small_phylo.fasta --output out.ecomp
/usr/bin/time -p gzip  -k data/fixtures/small_phylo.fasta
/usr/bin/time -p bzip2 -k data/fixtures/small_phylo.fasta

Python API

Everything the CLI does is re-exported in ecomp.

from ecomp import zip, unzip, read_alignment, percentage_identity, column_base_counts

# File-based workflow
archive_path, metadata_path = zip(
    "data/example.fasta",
    metadata_path="data/example.json",  # optional JSON copy
)
restored_path = unzip(archive_path, output_path="data/restored.fasta")

# Diagnostics on an AlignmentFrame
frame = read_alignment("data/example.fasta")
pct_identity = percentage_identity(frame)
base_counts = column_base_counts(frame)

print(f"Mean pairwise identity: {pct_identity:.2f}%")
print("Column 1 counts:", base_counts[0])

In-memory usage (no intermediate files):

from ecomp import AlignmentFrame, compress_alignment, decompress_alignment

frame = AlignmentFrame(
    ids=["s1", "s2"],
    sequences=["ACGT", "ACGA"],
    alphabet=["A", "C", "G", "T"],
)
compressed = compress_alignment(frame)
restored = decompress_alignment(compressed.payload, compressed.metadata)
assert restored.sequences == frame.sequences

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ecomp-0.0.2.tar.gz (33.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ecomp-0.0.2-py3-none-any.whl (35.3 kB view details)

Uploaded Python 3

File details

Details for the file ecomp-0.0.2.tar.gz.

File metadata

  • Download URL: ecomp-0.0.2.tar.gz
  • Upload date:
  • Size: 33.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for ecomp-0.0.2.tar.gz
Algorithm Hash digest
SHA256 0a96bf191da09f3ccabda6a2d772efc8e35b1bc59ddcd7463dfeb39734af2b94
MD5 30108c2de77f84f34ac19a54c4127a0a
BLAKE2b-256 83b1e8cf800b9d6de0bc663bb54fa726f42442f5affdedf2355d544da56f562c

See more details on using hashes here.

File details

Details for the file ecomp-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: ecomp-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 35.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for ecomp-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 83b756dd9be4ab3d180431e3c08b7ab3dce83b86e624dd7080f8542e1d40e544
MD5 b6bbbb7257addfcbab58051ff30acdc9
BLAKE2b-256 01904664044ab5e60c645d5d181f5be44ce806d12f051858954ee08dc1d489b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page