Skip to main content

A lightweight Python implementation of the Burrows–Wheeler Aligner (BWA) with FM-index and multithreaded alignment.

Project description

🧬 PyBWA_lite — A Lightweight Burrows–Wheeler Aligner in Python

Build Status License: MIT PyPI Python GitHub stars


🧠 Overview

PyBWA_lite is a pure-Python implementation of the Burrows–Wheeler Aligner (BWA) algorithm for fast, memory-efficient sequence alignment. It supports FM-index construction, banded Smith–Waterman extension, multithreaded alignment, and FASTQ I/O with quality scores — making it a compact, research-friendly alternative to native BWA for teaching, prototyping, and custom pipelines.


✨ Features

FM-Index Construction & Querying
Efficient Burrows–Wheeler Transform (BWT)–based indexing with rank/select operations for fast substring search and backward extension.

Banded Smith–Waterman Extension
Optimized local alignment algorithm that restricts the search space to a diagonal band — improving speed without losing accuracy.

Parallel Read Alignment
Multithreaded alignment engine supporting FASTA and FASTQ reads for high-throughput analysis.

Quality-Aware FASTQ Parsing
Fully supports read qualities, enabling accurate scoring and filtering during alignment.

SAM/BAM Output
Interoperable with the pysam library — easily write, parse, and manipulate alignment outputs.

Command-Line Interface (CLI)
Use pybwa-lite directly from your terminal for quick indexing and read alignment tasks:

pybwa-lite index reference.fasta
pybwa-lite align reference.fmi reads.fastq -o output.sam


---

### 🧩 Installation

#### From PyPI (once released)

```bash
pip install pybwa

From Source (Development)

git clone https://github.com/soumyapriyagoswami/pybwa.git
cd pybwa
pip install -e .

⚙️ Quick Start

1️⃣ Build an Index

from pybwa_lite.index import build_index
fm = build_index("reference.fasta", out_path="ref_index.pkl")

2️⃣ Align Reads (Single or FASTQ)

from pybwa_lite.align import align_reads
results = align_reads("reads.fastq", "ref_index.pkl", threads=4)
for r in results:
    print(r)

3️⃣ Command-Line Usage

# Build index
pybwa_lite index reference.fasta -o ref_index.pkl

# Align reads
pybwa_lite align reads.fastq -x ref_index.pkl -t 4 -o output.sam

🧠 Architecture Overview

Module Description
index.py Builds and loads FM-index structures for references
fmidx.py Implements FM-index, suffix array, and BWT
align.py Seeding, banded Smith–Waterman extension, multithreaded alignment
samtools.py Utilities for SAM/BAM file handling using pysam
cli.py Command-line interface for index building and alignment
tests/ Pytest-based unit tests for FM-index and alignment verification

🧪 Running Tests

pytest -v

Expected output:

tests/test_alignment.py::test_single_read_alignment PASSED
tests/test_alignment.py::test_fastq_alignment_with_qualities PASSED
tests/test_alignment.py::test_multithreaded_alignment_consistency PASSED
tests/test_fmindex.py::test_suffix_array_sorted PASSED
tests/test_fmindex.py::test_backward_search_basic PASSED

⚡ Performance Highlights

Feature Description Speedup
FM-index lookup Optimized with prefix caching ~3× faster
Multithreading Parallel read alignment ~N× faster (per thread)
Banded SW Reduced dynamic programming table ~5× faster on long reads

🧰 Dependencies

  • numpy
  • pysam
  • tqdm
  • pytest (for testing only)

📦 Project Structure

pybwa/
├── __init__.py
├── align.py
├── cli.py
├── fmidx.py
├── index.py
├── samtools.py
tests/
├── test_alignment.py
├── test_fmindex.py

🤝 Contributing

Contributions are welcome! If you have ideas for improving FM-index efficiency, integrating GPU kernels, or extending to RNA-seq workflows:

  1. Fork this repo
  2. Create a feature branch
  3. Submit a pull request 🚀

🧾 License

MIT License © 2025 Soumyapriya Goswami


🌐 Links

🔗 GitHub Repository: soumyapriyagoswami/pybwa 📘 Documentation: Coming Soon 🐍 PyPI Package: (after release)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybwa_lite-0.1.3.tar.gz (12.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pybwa_lite-0.1.3-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file pybwa_lite-0.1.3.tar.gz.

File metadata

  • Download URL: pybwa_lite-0.1.3.tar.gz
  • Upload date:
  • Size: 12.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for pybwa_lite-0.1.3.tar.gz
Algorithm Hash digest
SHA256 833839e842e9daaa7a813c4140e7511729ff3a8e33d529a8aea28a906adb8828
MD5 8de1e1b1dd3aa5a92e952046b0959904
BLAKE2b-256 d747d9e4a30842b8252b48b222fe2598991e32bc489628f8e80294cf611c32c7

See more details on using hashes here.

File details

Details for the file pybwa_lite-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pybwa_lite-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.4

File hashes

Hashes for pybwa_lite-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f4976e1cd2f15eaf3fcfa929be1a24aeddf733d1c4665cd2eec91bf851812c67
MD5 dc5ff98cc4e5fb7d640f0ce79f5c37eb
BLAKE2b-256 f95061e14677d1431897bef0368cb1970dd47bb5681690b292b7a0d9487c7e70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page