Skip to main content

Pure Python Clustal Omega Multiple Sequence Alignment implementation

Project description

BunyaminMSA

Pure Python implementation of the Clustal Omega Multiple Sequence Alignment (MSA) algorithm.

Bioinformatics Final Project — Bunyamin Arpc


Installation

pip install BunyaminMSA

Or from source:

git clone https://github.com/bunyaminarpc/BunyaminMSA.git
cd BunyaminMSA
pip install -e .

Quick Start

from bunyaminmsa import ClustalOmega

msa = ClustalOmega()

sequences = ["ACGTACGT", "ACGGACGT", "TTTTACGT"]
names     = ["Human", "Mouse", "Zebrafish"]

result = msa.align(sequences, names=names)
print(result["alignment_str"])

FASTA Input

fasta = """
>seq1
ACGTACGTACGT
>seq2
ACGGACGTACGG
>seq3
TTTTACGTATTT
"""
result = msa.align_from_fasta(fasta)

Command Line

bunyaminmsa --fasta input.fasta
bunyaminmsa --seqs ACGT ACGG TTTT --names s1 s2 s3
bunyaminmsa --fasta input.fasta --output alignment.aln

Algorithm Overview

Clustal Omega performs MSA in three main stages:

1. Pairwise Distance Calculation (k-mer based)

All sequence pairs are compared using k-mer frequency profiles and cosine distance. This is faster than full pairwise DP and robust to long sequences.

2. Guide Tree Construction (UPGMA)

The pairwise distance matrix is used to build a binary guide tree using UPGMA (Unweighted Pair Group Method with Arithmetic mean). Closely related sequences are merged first.

3. Progressive Alignment

Sequences are aligned following the guide tree (post-order traversal):

  • Leaf–Leaf: Needleman-Wunsch global alignment with affine gap penalties
  • Profile–Profile: Frequency profiles are built for each aligned group; alignment proceeds between profiles column-by-column

API Reference

ClustalOmega

Method Description
align(sequences, names=None) Align list of sequences
align_from_fasta(fasta_text) Parse FASTA string and align
get_distance_matrix() Return last computed distance matrix
get_guide_tree() Return last computed guide tree

Result Dictionary

Key Type Description
names list[str] Sequence names
aligned list[str] Aligned sequences (with gaps)
alignment_str str CLUSTAL-format alignment
distance_matrix list[list[float]] n×n pairwise distances
sequence_type str 'dna' or 'protein'
guide_tree str String representation of UPGMA tree

Running Tests

python tests/test_clustal_omega.py
# or
pytest tests/

License

MIT License — Bunyamin Arpc

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bunyaminmsa-1.0.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bunyaminmsa-1.0.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file bunyaminmsa-1.0.0.tar.gz.

File metadata

  • Download URL: bunyaminmsa-1.0.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for bunyaminmsa-1.0.0.tar.gz
Algorithm Hash digest
SHA256 772b5bace157f4c47e0c25936a88355d1e402feef0b4ea6cddef97de9781e526
MD5 746ad099a15e947192d2a5662a5fef0b
BLAKE2b-256 85ca6a397c564ccc20f15e826da0696c5648d809dd333d98354a5ffee9d108b5

See more details on using hashes here.

File details

Details for the file bunyaminmsa-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: bunyaminmsa-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.6

File hashes

Hashes for bunyaminmsa-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 43a47006fe95c7b81d0a41e6167559d70a7278a56663198c381c4b9d4e05ef3e
MD5 839291353c6beb7ae9b327e82325cccf
BLAKE2b-256 64b831dc7b08102836b5d4ca13c6a05182bd1723e5844371d0a5554e8514313b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page