Simple Multiple Sequence Alignment using Needleman-Wunsch Dynamic Programming
Project description
arman-bio-msa
Simple Multiple Sequence Alignment using Needleman-Wunsch Dynamic Programming.
Student Number: 221201931
Algorithm Assignment: 221201931 % 4 = 3 → Dynamic Programming
Why Dynamic Programming?
Each student is assigned an algorithm based on student_number % 4.
Since 221201931 % 4 = 3, the assigned algorithm is Dynamic Programming.
This package implements the Needleman-Wunsch algorithm, which is the most well-known dynamic programming method for sequence alignment in bioinformatics.
What is Multiple Sequence Alignment?
Multiple Sequence Alignment (MSA) means lining up three or more biological
sequences (DNA, RNA, or protein) so that similar characters appear in the same
column. Gaps (-) are inserted where needed. This helps scientists find
evolutionary relationships and important regions across species.
What is Needleman-Wunsch?
Needleman-Wunsch is a dynamic programming algorithm that finds the optimal global alignment between two sequences. It works in three steps:
- Initialize a score matrix with gap penalties
- Fill each cell by choosing the best of three options (diagonal, up, left)
- Traceback from the bottom-right corner to build the aligned sequences
For multiple sequences, this package uses progressive alignment: align the first two sequences, then add each remaining sequence one by one.
Scoring System
| Event | Score |
|---|---|
| Match | +1 |
| Mismatch | -1 |
| Gap | -2 |
Scores are customizable via function parameters.
Installation
Install from PyPI (after publishing):
pip install arman-bio-msa
Install from TestPyPI:
pip install -i https://test.pypi.org/simple/ arman-bio-msa
Install locally for development:
git clone https://github.com/armanshafiee/arman-bio-msa.git
cd arman-bio-msa
pip install -e .
Usage
Run the demo:
python main.py
Use in your own code:
from bio_msa import needleman_wunsch, multiple_alignment
# Pairwise alignment
matrix, aligned1, aligned2 = needleman_wunsch("AGCTG", "ACGTG")
print(aligned1) # AGCTG
print(aligned2) # ACGTG
# Multiple alignment
sequences = ["AGCTG", "ACGTG", "AGTC"]
result = multiple_alignment(sequences)
for seq in result:
print(seq)
Custom scoring:
matrix, a1, a2 = needleman_wunsch("ATCG", "ACG", match=2, mismatch=-1, gap=-3)
Example Output
Input Sequences:
Seq1: AGCTG
Seq2: ACGTG
Seq3: AGTC
FINAL MULTIPLE ALIGNMENT
Seq1: AGCTG
Seq2: ACGTG
Seq3: AG-TC
Building and Publishing to PyPI
Step 1: Install build tools
python -m pip install --upgrade build twine
Step 2: Build the package
python -m build
This creates files in the dist/ folder.
Step 3: Upload to TestPyPI (for testing)
python -m twine upload --repository testpypi dist/*
Step 4: Test installation from TestPyPI
pip install -i https://test.pypi.org/simple/ arman-bio-msa
Step 5: Upload to real PyPI
python -m twine upload dist/*
Step 6: Install from PyPI
pip install arman-bio-msa
Project Structure
arman-bio-msa/
├── bio_msa/
│ ├── __init__.py # Package exports
│ └── aligner.py # Needleman-Wunsch algorithm
├── examples/
│ └── example_usage.py # Usage examples
├── main.py # Run the demo
├── pyproject.toml # PyPI package config
├── README.md # This file
├── report_draft.md # Project report
├── LICENSE # MIT License
└── .gitignore
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arman_bio_msa-0.1.0.tar.gz.
File metadata
- Download URL: arman_bio_msa-0.1.0.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1080228269938e98a203213a8e17ff235ce2f4ee3dd6350077ed85c798b4f7c9
|
|
| MD5 |
7e4280981fb6d1105f1b4d418c562ad7
|
|
| BLAKE2b-256 |
2a10504fbf2ab4dec5ba741ecb24daf8c3712ef669a770db11bd7cfb71b3d698
|
File details
Details for the file arman_bio_msa-0.1.0-py3-none-any.whl.
File metadata
- Download URL: arman_bio_msa-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e6a6314edfda6ce1527a1293fca4e5df0edd26bcd14a79f61b73b72019e68ee
|
|
| MD5 |
7f7206a3911d432c9b125d41bcfb4407
|
|
| BLAKE2b-256 |
d18948a5ba570a2af898be873eb4113abf2fb2e2a83dfa9f8ffdd6dd29e7ccc9
|