Multiple Sequence Alignment using Dynamic Programming

These details have not been verified by PyPI

Project links

Homepage

Project description

huseyinmsa

A Python library for Multiple Sequence Alignment (MSA) implemented from scratch using Dynamic Programming.

Installation

pip install huseyinmsa

Quick start

from huseyinmsa import MSA, print_alignment

sequences = {
    "Seq1": "ACGTACGT",
    "Seq2": "ACGACGT",
    "Seq3": "ACGT",
}

msa = MSA()
alignment = msa.align(sequences)
print_alignment(alignment)
print("SP Score:", msa.sp_score(alignment))

Reading / writing FASTA files

from huseyinmsa import MSA, read_fasta, write_fasta, print_alignment

sequences = read_fasta("sequences.fasta")
msa = MSA(match=1, mismatch=-1, gap=-2)
alignment = msa.align(sequences)

print_alignment(alignment)
write_fasta(alignment, "aligned.fasta")

Algorithm

The library implements progressive multiple sequence alignment driven entirely by dynamic programming.

Stage 1 — Pairwise distances (Needleman-Wunsch)

Every pair of input sequences is globally aligned using the classic Needleman-Wunsch O(n·m) DP algorithm.

dp[0][j] = j × gap
dp[i][0] = i × gap

dp[i][j] = max(
    dp[i-1][j-1] + sub(seq1[i], seq2[j]),   # match or mismatch
    dp[i-1][j]   + gap,                      # delete from seq1
    dp[i][j-1]   + gap,                      # insert into seq1
)

After alignment, sequence identity (identical columns / total columns) is used to convert the score into a distance: distance = 1 − identity.

Stage 2 — Guide tree (UPGMA)

The n×n distance matrix is clustered with UPGMA:

Each sequence starts as its own cluster.
The two closest clusters are merged; their distance to every other cluster is recalculated as the size-weighted average.
Repeat until a single root cluster remains.

The result is a binary tree (e.g. (('A','B'),'C')) that tells the aligner which sequences to merge first — always the most similar ones.

Stage 3 — Progressive alignment (profile–profile DP)

Sequences are merged bottom-up along the guide tree. At each merge step, two blocks of already-aligned sequences (profiles) are aligned with a second DP pass. The score between two columns is the average substitution score over all non-gap character pairs; gap penalties are scaled by the fraction of real characters in a column.

Parameters

Parameter	Default	Meaning
`match`	`1`	Score for identical characters
`mismatch`	`-1`	Penalty for mismatched characters
`gap`	`-2`	Linear gap penalty

Scoring

After alignment, the Sum-of-Pairs (SP) score is available:

score = msa.sp_score(alignment)

It sums the pairwise substitution / gap score across every aligned column for every pair of sequences. Higher values indicate a better alignment.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huseyin_cakir-0.1.0.tar.gz (9.9 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

huseyin_cakir-0.1.0-py3-none-any.whl (10.5 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file huseyin_cakir-0.1.0.tar.gz.

File metadata

Download URL: huseyin_cakir-0.1.0.tar.gz
Upload date: Jun 2, 2026
Size: 9.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for huseyin_cakir-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e71c6fd5dbc137f4c9209f725e10c6109a522fff04239a478009aab26957093e`
MD5	`abb518d026039ab2262fbae05e7d945d`
BLAKE2b-256	`9017e2ae39db29748dfc629484405a7819b179e92d72c94cb877a70c33246006`

See more details on using hashes here.

File details

Details for the file huseyin_cakir-0.1.0-py3-none-any.whl.

File metadata

Download URL: huseyin_cakir-0.1.0-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 10.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for huseyin_cakir-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d8935e82451d7bffbe34e080ae5aceaa660c1b11b6e0d040bb4c8f616916c43c`
MD5	`89196b00aa73dfef3c8bcd51825b4476`
BLAKE2b-256	`b1eed83dc37dae0fe4800021ec0c33eae0f90cc296843a8c8164ddd0fca3b672`

See more details on using hashes here.

Huseyin-Cakir 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

huseyinmsa

Installation

Quick start

Reading / writing FASTA files

Algorithm

Stage 1 — Pairwise distances (Needleman-Wunsch)

Stage 2 — Guide tree (UPGMA)

Stage 3 — Progressive alignment (profile–profile DP)

Parameters

Scoring

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes