Skip to main content

Bindings to bwa aligner

Project description

bwamem

Python bindings to bwa mem aligner; sufficient to load and index and perform alignments of sequences to the index to obtain basic statistics.

These python bindings are licensed under Mozilla Public License 2.0, bwa is licenced under GNU General Public License v3.0.

Documentation can be found at https://y9c.github.io/bwamem/.

Installation

The git source repository contains bwa as a submodule. The repository should therefore be cloned using the recursive option.

The package setup.py script requires libbwa.a to have been built in the submodule directory before running. This can be performed via the libbwa.a target, which first makes some amendments to the bwa/Makefile. To build and install the package one should therefore run:

git clone --recursive https://github.com/y9c/bwamem.git
cd bwamem
make bwa/libbwa.a 
python setup.py install

Building BWA Indexes

The BwaIndexer class provides a pythonic interface to build BWA indexes from FASTA files. It supports different BWT construction algorithms:

from bwamem import BwaIndexer

# Create indexer with default settings (auto algorithm)
indexer = BwaIndexer()

# Build index from FASTA file
index_path = indexer.build_index('reference.fa')
print(f'Index built at: {index_path}')

# Use specific algorithm
indexer = BwaIndexer(algorithm='is')  # or 'rb2', 'bwtsw', 'auto'
index_path = indexer.build_index('reference.fa', prefix='my_index')

Available algorithms:

  • auto: Automatically choose algorithm based on genome size
  • rb2: RB2 algorithm (good for medium genomes)
  • bwtsw: BWT-SW algorithm (good for large genomes)
  • is: IS algorithm (good for small genomes)

Performing Alignments

The BwaAligner class provides a pythonic interface to bwa mem aligner. It takes as input a bwa index fileset on construction and can then be used to find alignments of sequences given as strings.

Single-End Alignment

For single-end reads, use the align() method with one sequence:

from bwamem import BwaAligner, Alignment
index = 'path/to/index' # the path given to bwa index
seq = 'ACGATCGCGATCGA'

aligner = BwaAligner(index)
alignments = aligner.align(seq)  # Returns tuple of Alignment objects
print('Found {} alignments.'.format(len(alignments)))
for aln in alignments:
    print(f'  {aln.rname}:{aln.pos} {aln.orient} {aln.cigar} (mapq={aln.mapq}, score={aln.score})')

Paired-End Alignment

For paired-end reads, use the align() method with two sequences:

from bwamem import BwaAligner, PairedAlignment
index = 'path/to/index'
read1 = 'ACGATCGCGATCGA'
read2 = 'TTCGATCGATCGAT'

aligner = BwaAligner(index)
paired_alignments = aligner.align(read1, read2)  # Returns tuple of PairedAlignment objects
print('Found {} paired alignments.'.format(len(paired_alignments)))
for pe_aln in paired_alignments:
    print(f'  Read1: {pe_aln.read1.rname}:{pe_aln.read1.pos} {pe_aln.read1.orient}')
    print(f'  Read2: {pe_aln.read2.rname}:{pe_aln.read2.pos} {pe_aln.read2.orient}')
    print(f'  Proper pair: {pe_aln.is_proper_pair}, Insert size: {pe_aln.insert_size}')

Custom Insert Size Distribution

For paired-end reads, you can specify the expected insert size distribution:

# With custom insert size parameters
paired_alignments = aligner.align(read1, read2, insert_size=500, insert_std=50)

Data Structures

Alignment (for single-end reads):

Alignment(rname='chr1', orient='+', pos=1000, mapq=60, cigar='100M', NM=0, score=100, is_primary=True)

PairedAlignment (for paired-end reads):

PairedAlignment(read1=Alignment(...), read2=Alignment(...), is_proper_pair=True, insert_size=500)

Alignment Parameters

Alignment parameters can be given as they are on the bwa mem command line:

from bwamem import BwaAligner
index = 'path/to/index'
options = '-x ont2d -A 1 -B 0'
aligner = BwaAligner(index, options=options)

The package now supports all BWA MEM options including paired-end specific parameters like insert size distribution (-I option).

Complete Workflow Example

Here's a complete example showing how to build an index and perform both single-end and paired-end alignments:

from bwamem import BwaIndexer, BwaAligner, Alignment, PairedAlignment

# Step 1: Build index from FASTA file
indexer = BwaIndexer(algorithm='auto')
index_path = indexer.build_index('reference.fa')
print(f'Index built at: {index_path}')

# Step 2: Create aligner with the index
aligner = BwaAligner(index_path)

# Step 3a: Single-end alignment
print("Single-end alignments:")
se_sequences = ['ACGATCGCGATCGA', 'GCTAGCTAGCTAG']
for i, seq in enumerate(se_sequences, 1):
    alignments = aligner.align(seq)
    print(f'Found {len(alignments)} alignments for sequence {i}')
    for aln in alignments:
        print(f'  {aln.rname}:{aln.pos} {aln.orient} {aln.cigar} (mapq={aln.mapq})')

# Step 3b: Paired-end alignment
print("\nPaired-end alignments:")
pe_reads = [
    ('ACGATCGCGATCGA', 'TTCGATCGATCGAT'),
    ('GCTAGCTAGCTAG', 'CGATCGATCGATC')
]
for i, (read1, read2) in enumerate(pe_reads, 1):
    paired_alignments = aligner.align(read1, read2)
    print(f'Found {len(paired_alignments)} paired alignments for read pair {i}')
    for pe_aln in paired_alignments:
        print(f'  Read1: {pe_aln.read1.rname}:{pe_aln.read1.pos} {pe_aln.read1.orient}')
        print(f'  Read2: {pe_aln.read2.rname}:{pe_aln.read2.pos} {pe_aln.read2.orient}')
        print(f'  Proper pair: {pe_aln.is_proper_pair}, Insert size: {pe_aln.insert_size}')

# Step 3c: Paired-end with custom insert size
print("\nPaired-end with custom insert size:")
pe_alignments = aligner.align('ACGATCGCGATCGA', 'TTCGATCGATCGAT', 
                             insert_size=500, insert_std=50)
print(f'Found {len(pe_alignments)} alignments with custom insert size')

Advanced Paired-End Features

# Filter for proper pairs only
proper_pairs = [pe for pe in pe_alignments if pe.is_proper_pair]
print(f'Found {len(proper_pairs)} proper pairs')

# Access individual read alignments
for pe_aln in pe_alignments:
    read1_aln = pe_aln.read1
    read2_aln = pe_aln.read2
    if read1_aln.is_primary and read2_aln.is_primary:
        print(f'Primary alignment: {read1_aln.rname}:{read1_aln.pos}-{read2_aln.pos}')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bwamem-0.0.3.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bwamem-0.0.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (373.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

bwamem-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (373.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

bwamem-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (373.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

bwamem-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (373.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file bwamem-0.0.3.tar.gz.

File metadata

  • Download URL: bwamem-0.0.3.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bwamem-0.0.3.tar.gz
Algorithm Hash digest
SHA256 1f7173e6fb48349bfa1a30a6cc993773c5c85681643b1bf809557ef3dad2ae6c
MD5 16bf8da57daee7e537d21f21981128c6
BLAKE2b-256 430cd07094924451f7b9a0ca97eaf5d308ce84b8867d8d537710ef4a0b5dfc41

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.3.tar.gz:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 16c666eac07759081cee6652d0de98354bf93309e7c12b906fc8e2c935810f13
MD5 1ad88e7372226e259123165d8a966ae8
BLAKE2b-256 f526e45d2fb3753fe1dd2cfc7359d6ffd6cb489b58b2c3cf97640755bd7cc810

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.3-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e5d0aeaedecbec9662d8de88097e31e60fcb662784920bec0c4f78a95b0b9bcb
MD5 a04edd938029fc5fa241104779be2418
BLAKE2b-256 d82dcaad01377da6e6ace7ed0ea8347dbc482818956d26123213902ed567c41f

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a7876d6e2c5ee8ef997786dec2b4d32b03431cee89592f659952b3804a4e66e0
MD5 e60d9d5e56fd3299034ab9a13e1ba04a
BLAKE2b-256 5fd401b6892a5f01e8cbbc32925e040a4291dd08cbb451f37b3d3a1af7e47f3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 27bbdba4d5a3fe31cb6b454e9293c1cd6109327649a13c9cffd16c6a3c9da71b
MD5 e9c709e9256dbb9aac201a4e3071f113
BLAKE2b-256 9daf05b38bf43200f1a187fa1114d563a87216c25c31d8aef052c648ca0148af

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page