Skip to main content

Simple Multiple Sequence Alignment using Needleman-Wunsch Dynamic Programming

Project description

arman-bio-msa

Simple Multiple Sequence Alignment using Needleman-Wunsch Dynamic Programming.

Student Number: 221201931
Algorithm Assignment: 221201931 % 4 = 3 → Dynamic Programming


Why Dynamic Programming?

Each student is assigned an algorithm based on student_number % 4. Since 221201931 % 4 = 3, the assigned algorithm is Dynamic Programming.

This package implements the Needleman-Wunsch algorithm, which is the most well-known dynamic programming method for sequence alignment in bioinformatics.

What is Multiple Sequence Alignment?

Multiple Sequence Alignment (MSA) means lining up three or more biological sequences (DNA, RNA, or protein) so that similar characters appear in the same column. Gaps (-) are inserted where needed. This helps scientists find evolutionary relationships and important regions across species.

What is Needleman-Wunsch?

Needleman-Wunsch is a dynamic programming algorithm that finds the optimal global alignment between two sequences. It works in three steps:

  1. Initialize a score matrix with gap penalties
  2. Fill each cell by choosing the best of three options (diagonal, up, left)
  3. Traceback from the bottom-right corner to build the aligned sequences

For multiple sequences, this package uses progressive alignment: align the first two sequences, then add each remaining sequence one by one.

Scoring System

Event Score
Match +1
Mismatch -1
Gap -2

Scores are customizable via function parameters.


Installation

Install from PyPI (after publishing):

pip install arman-bio-msa

Install from TestPyPI:

pip install -i https://test.pypi.org/simple/ arman-bio-msa

Install locally for development:

git clone https://github.com/armanshafiee/arman-bio-msa.git
cd arman-bio-msa
pip install -e .

Usage

Run the demo:

python main.py

Use in your own code:

from bio_msa import needleman_wunsch, multiple_alignment

# Pairwise alignment
matrix, aligned1, aligned2 = needleman_wunsch("AGCTG", "ACGTG")
print(aligned1)  # AGCTG
print(aligned2)  # ACGTG

# Multiple alignment
sequences = ["AGCTG", "ACGTG", "AGTC"]
result = multiple_alignment(sequences)
for seq in result:
    print(seq)

Custom scoring:

matrix, a1, a2 = needleman_wunsch("ATCG", "ACG", match=2, mismatch=-1, gap=-3)

Example Output

Input Sequences:
  Seq1: AGCTG
  Seq2: ACGTG
  Seq3: AGTC

FINAL MULTIPLE ALIGNMENT
  Seq1: AGCTG
  Seq2: ACGTG
  Seq3: AG-TC

Building and Publishing to PyPI

Step 1: Install build tools

python -m pip install --upgrade build twine

Step 2: Build the package

python -m build

This creates files in the dist/ folder.

Step 3: Upload to TestPyPI (for testing)

python -m twine upload --repository testpypi dist/*

Step 4: Test installation from TestPyPI

pip install -i https://test.pypi.org/simple/ arman-bio-msa

Step 5: Upload to real PyPI

python -m twine upload dist/*

Step 6: Install from PyPI

pip install arman-bio-msa

Project Structure

arman-bio-msa/
├── bio_msa/
│   ├── __init__.py        # Package exports
│   └── aligner.py         # Needleman-Wunsch algorithm
├── examples/
│   └── example_usage.py   # Usage examples
├── main.py                # Run the demo
├── pyproject.toml         # PyPI package config
├── README.md              # This file
├── report_draft.md        # Project report
├── LICENSE                # MIT License
└── .gitignore

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arman_bio_msa-0.1.0.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arman_bio_msa-0.1.0-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file arman_bio_msa-0.1.0.tar.gz.

File metadata

  • Download URL: arman_bio_msa-0.1.0.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for arman_bio_msa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1080228269938e98a203213a8e17ff235ce2f4ee3dd6350077ed85c798b4f7c9
MD5 7e4280981fb6d1105f1b4d418c562ad7
BLAKE2b-256 2a10504fbf2ab4dec5ba741ecb24daf8c3712ef669a770db11bd7cfb71b3d698

See more details on using hashes here.

File details

Details for the file arman_bio_msa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: arman_bio_msa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for arman_bio_msa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e6a6314edfda6ce1527a1293fca4e5df0edd26bcd14a79f61b73b72019e68ee
MD5 7f7206a3911d432c9b125d41bcfb4407
BLAKE2b-256 d18948a5ba570a2af898be873eb4113abf2fb2e2a83dfa9f8ffdd6dd29e7ccc9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page