Skip to main content

A lightweight Python implementation of the Burrows–Wheeler Aligner (BWA) with FM-index and multithreaded alignment.

Project description

🧬 PyBWA_lite — A Lightweight Burrows–Wheeler Aligner in Python

Build Status License: MIT PyPI Python GitHub stars


🧠 Overview

PyBWA_lite is a pure-Python implementation of the Burrows–Wheeler Aligner (BWA) algorithm for fast, memory-efficient sequence alignment. It supports FM-index construction, banded Smith–Waterman extension, multithreaded alignment, and FASTQ I/O with quality scores — making it a compact, research-friendly alternative to native BWA for teaching, prototyping, and custom pipelines.


🚀 Features

FM-Index Construction & Querying — efficient backward search and rank/select ✅ Banded Smith–Waterman Extension — optimized local alignment with reduced search space ✅ Parallel Read Alignment — multithreaded support for FASTA/FASTQ reads ✅ Quality-Aware FASTQ Parsing — handles read qualities natively ✅ SAM/BAM Output — integrates with pysam for interoperability ✅ Command-Line Interface (CLI) — align reads or build indices directly from terminal ✅ Lightweight & Extensible — no compiled C/C++ backend required


🧩 Installation

From PyPI (once released)

pip install pybwa

From Source (Development)

git clone https://github.com/soumyapriyagoswami/pybwa.git
cd pybwa
pip install -e .

⚙️ Quick Start

1️⃣ Build an Index

from pybwa.index import build_index
fm = build_index("reference.fasta", out_path="ref_index.pkl")

2️⃣ Align Reads (Single or FASTQ)

from pybwa.align import align_reads
results = align_reads("reads.fastq", "ref_index.pkl", threads=4)
for r in results:
    print(r)

3️⃣ Command-Line Usage

# Build index
pybwa index reference.fasta -o ref_index.pkl

# Align reads
pybwa align reads.fastq -x ref_index.pkl -t 4 -o output.sam

🧠 Architecture Overview

Module Description
index.py Builds and loads FM-index structures for references
fmidx.py Implements FM-index, suffix array, and BWT
align.py Seeding, banded Smith–Waterman extension, multithreaded alignment
samtools.py Utilities for SAM/BAM file handling using pysam
cli.py Command-line interface for index building and alignment
tests/ Pytest-based unit tests for FM-index and alignment verification

🧪 Running Tests

pytest -v

Expected output:

tests/test_alignment.py::test_single_read_alignment PASSED
tests/test_alignment.py::test_fastq_alignment_with_qualities PASSED
tests/test_alignment.py::test_multithreaded_alignment_consistency PASSED
tests/test_fmindex.py::test_suffix_array_sorted PASSED
tests/test_fmindex.py::test_backward_search_basic PASSED

⚡ Performance Highlights

Feature Description Speedup
FM-index lookup Optimized with prefix caching ~3× faster
Multithreading Parallel read alignment ~N× faster (per thread)
Banded SW Reduced dynamic programming table ~5× faster on long reads

🧰 Dependencies

  • numpy
  • pysam
  • tqdm
  • pytest (for testing only)

📦 Project Structure

pybwa/
├── __init__.py
├── align.py
├── cli.py
├── fmidx.py
├── index.py
├── samtools.py
tests/
├── test_alignment.py
├── test_fmindex.py

🤝 Contributing

Contributions are welcome! If you have ideas for improving FM-index efficiency, integrating GPU kernels, or extending to RNA-seq workflows:

  1. Fork this repo
  2. Create a feature branch
  3. Submit a pull request 🚀

🧾 License

MIT License © 2025 Soumyapriya Goswami


🌐 Links

🔗 GitHub Repository: soumyapriyagoswami/pybwa 📘 Documentation: Coming Soon 🐍 PyPI Package: (after release)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybwa_lite-0.1.2.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pybwa_lite-0.1.2-py3-none-any.whl (9.5 kB view details)

Uploaded Python 3

File details

Details for the file pybwa_lite-0.1.2.tar.gz.

File metadata

  • Download URL: pybwa_lite-0.1.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for pybwa_lite-0.1.2.tar.gz
Algorithm Hash digest
SHA256 4025ac4ae18612e9627e85c9bce3741e7c04ee70d5a580311a33c7a47bca5fb4
MD5 9a55a73bbf4a1f5d9eacc844a8c26622
BLAKE2b-256 aec948d991b8c42342a02bca865f1414184512557d1bc1bfeed5aab9765da5d8

See more details on using hashes here.

File details

Details for the file pybwa_lite-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pybwa_lite-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 9.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for pybwa_lite-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f4607ef991be7e6115f8b199c6cca4e18d26df13801dc61ef1cf4b3ad093014f
MD5 63b7c805ac8c7e2fc8e49cc7399cd53d
BLAKE2b-256 b3edaab9e5b58d72285c9e2b0cc36ce05c95990562c55dcf1a696e13a6a584b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page