A lightweight Python implementation of the Burrows–Wheeler Aligner (BWA) with FM-index and multithreaded alignment.
Project description
🧬 PyBWA_lite — A Lightweight Burrows–Wheeler Aligner in Python
🧠 Overview
PyBWA_lite is a pure-Python implementation of the Burrows–Wheeler Aligner (BWA) algorithm for fast, memory-efficient sequence alignment. It supports FM-index construction, banded Smith–Waterman extension, multithreaded alignment, and FASTQ I/O with quality scores — making it a compact, research-friendly alternative to native BWA for teaching, prototyping, and custom pipelines.
✨ Features
✅ FM-Index Construction & Querying
Efficient Burrows–Wheeler Transform (BWT)–based indexing with rank/select operations for fast substring search and backward extension.
✅ Banded Smith–Waterman Extension
Optimized local alignment algorithm that restricts the search space to a diagonal band — improving speed without losing accuracy.
✅ Parallel Read Alignment
Multithreaded alignment engine supporting FASTA and FASTQ reads for high-throughput analysis.
✅ Quality-Aware FASTQ Parsing
Fully supports read qualities, enabling accurate scoring and filtering during alignment.
✅ SAM/BAM Output
Interoperable with the pysam library — easily write, parse, and manipulate alignment outputs.
✅ Command-Line Interface (CLI)
Use pybwa-lite directly from your terminal for quick indexing and read alignment tasks:
pybwa-lite index reference.fasta
pybwa-lite align reference.fmi reads.fastq -o output.sam
---
### 🧩 Installation
#### From PyPI (once released)
```bash
pip install pybwa
From Source (Development)
git clone https://github.com/soumyapriyagoswami/pybwa.git
cd pybwa
pip install -e .
⚙️ Quick Start
1️⃣ Build an Index
from pybwa_lite.index import build_index
fm = build_index("reference.fasta", out_path="ref_index.pkl")
2️⃣ Align Reads (Single or FASTQ)
from pybwa_lite.align import align_reads
results = align_reads("reads.fastq", "ref_index.pkl", threads=4)
for r in results:
print(r)
3️⃣ Command-Line Usage
# Build index
pybwa_lite index reference.fasta -o ref_index.pkl
# Align reads
pybwa_lite align reads.fastq -x ref_index.pkl -t 4 -o output.sam
🧠 Architecture Overview
| Module | Description |
|---|---|
index.py |
Builds and loads FM-index structures for references |
fmidx.py |
Implements FM-index, suffix array, and BWT |
align.py |
Seeding, banded Smith–Waterman extension, multithreaded alignment |
samtools.py |
Utilities for SAM/BAM file handling using pysam |
cli.py |
Command-line interface for index building and alignment |
tests/ |
Pytest-based unit tests for FM-index and alignment verification |
🧪 Running Tests
pytest -v
Expected output:
tests/test_alignment.py::test_single_read_alignment PASSED
tests/test_alignment.py::test_fastq_alignment_with_qualities PASSED
tests/test_alignment.py::test_multithreaded_alignment_consistency PASSED
tests/test_fmindex.py::test_suffix_array_sorted PASSED
tests/test_fmindex.py::test_backward_search_basic PASSED
⚡ Performance Highlights
| Feature | Description | Speedup |
|---|---|---|
| FM-index lookup | Optimized with prefix caching | ~3× faster |
| Multithreading | Parallel read alignment | ~N× faster (per thread) |
| Banded SW | Reduced dynamic programming table | ~5× faster on long reads |
🧰 Dependencies
numpypysamtqdmpytest(for testing only)
📦 Project Structure
pybwa/
├── __init__.py
├── align.py
├── cli.py
├── fmidx.py
├── index.py
├── samtools.py
tests/
├── test_alignment.py
├── test_fmindex.py
🤝 Contributing
Contributions are welcome! If you have ideas for improving FM-index efficiency, integrating GPU kernels, or extending to RNA-seq workflows:
- Fork this repo
- Create a feature branch
- Submit a pull request 🚀
🧾 License
MIT License © 2025 Soumyapriya Goswami
🌐 Links
🔗 GitHub Repository: soumyapriyagoswami/pybwa 📘 Documentation: Coming Soon 🐍 PyPI Package: (after release)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pybwa_lite-0.1.3.tar.gz.
File metadata
- Download URL: pybwa_lite-0.1.3.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
833839e842e9daaa7a813c4140e7511729ff3a8e33d529a8aea28a906adb8828
|
|
| MD5 |
8de1e1b1dd3aa5a92e952046b0959904
|
|
| BLAKE2b-256 |
d747d9e4a30842b8252b48b222fe2598991e32bc489628f8e80294cf611c32c7
|
File details
Details for the file pybwa_lite-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pybwa_lite-0.1.3-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f4976e1cd2f15eaf3fcfa929be1a24aeddf733d1c4665cd2eec91bf851812c67
|
|
| MD5 |
dc5ff98cc4e5fb7d640f0ce79f5c37eb
|
|
| BLAKE2b-256 |
f95061e14677d1431897bef0368cb1970dd47bb5681690b292b7a0d9487c7e70
|