Skip to main content

Bindings to bwa aligner

Project description

bwamem

Python bindings to bwa mem aligner; sufficient to load and index and perform alignments of sequences to the index to obtain basic statistics.

These python bindings are licensed under Mozilla Public License 2.0, bwa is licenced under GNU General Public License v3.0.

Documentation can be found at https://y9c.github.io/bwamem/.

Installation

The git source repository contains bwa as a submodule. The repository should therefore be cloned using the recursive option.

The package setup.py script requires libbwa.a to have been built in the submodule directory before running. This can be performed via the libbwa.a target, which first makes some amendments to the bwa/Makefile. To build and install the package one should therefore run:

git clone --recursive https://github.com/y9c/bwamem.git
cd bwamem
make bwa/libbwa.a 
python setup.py install

Building BWA Indexes

The BwaIndexer class provides a pythonic interface to build BWA indexes from FASTA files. It supports different BWT construction algorithms:

from bwamem import BwaIndexer

# Create indexer with default settings (auto algorithm)
indexer = BwaIndexer()

# Build index from FASTA file
index_path = indexer.build_index('reference.fa')
print(f'Index built at: {index_path}')

# Use specific algorithm
indexer = BwaIndexer(algorithm='is')  # or 'rb2', 'bwtsw', 'auto'
index_path = indexer.build_index('reference.fa', prefix='my_index')

Available algorithms:

  • auto: Automatically choose algorithm based on genome size
  • rb2: RB2 algorithm (good for medium genomes)
  • bwtsw: BWT-SW algorithm (good for large genomes)
  • is: IS algorithm (good for small genomes)

Performing Alignments

The BwaAligner class provides a pythonic interface to bwa mem aligner. It takes as input a bwa index fileset on construction and can then be used to find alignments of sequences given as strings.

Single-End Alignment

For single-end reads, use the align() method with one sequence:

from bwamem import BwaAligner, Alignment
index = 'path/to/index' # the path given to bwa index
seq = 'ACGATCGCGATCGA'

aligner = BwaAligner(index)
alignments = aligner.align(seq)  # Returns tuple of Alignment objects
print('Found {} alignments.'.format(len(alignments)))
for aln in alignments:
    print(f'  {aln.rname}:{aln.pos} {aln.orient} {aln.cigar} (mapq={aln.mapq}, score={aln.score})')

Paired-End Alignment

For paired-end reads, use the align() method with two sequences:

from bwamem import BwaAligner, PairedAlignment
index = 'path/to/index'
read1 = 'ACGATCGCGATCGA'
read2 = 'TTCGATCGATCGAT'

aligner = BwaAligner(index)
paired_alignments = aligner.align(read1, read2)  # Returns tuple of PairedAlignment objects
print('Found {} paired alignments.'.format(len(paired_alignments)))
for pe_aln in paired_alignments:
    print(f'  Read1: {pe_aln.read1.rname}:{pe_aln.read1.pos} {pe_aln.read1.orient}')
    print(f'  Read2: {pe_aln.read2.rname}:{pe_aln.read2.pos} {pe_aln.read2.orient}')
    print(f'  Proper pair: {pe_aln.is_proper_pair}, Insert size: {pe_aln.insert_size}')

Custom Insert Size Distribution

For paired-end reads, you can specify the expected insert size distribution:

# With custom insert size parameters
paired_alignments = aligner.align(read1, read2, insert_size=500, insert_std=50)

Data Structures

Alignment (for single-end reads):

Alignment(rname='chr1', orient='+', pos=1000, mapq=60, cigar='100M', NM=0, score=100, is_primary=True)

PairedAlignment (for paired-end reads):

PairedAlignment(read1=Alignment(...), read2=Alignment(...), is_proper_pair=True, insert_size=500)

Alignment Parameters

Alignment parameters can be given as they are on the bwa mem command line:

from bwamem import BwaAligner
index = 'path/to/index'
options = '-x ont2d -A 1 -B 0'
aligner = BwaAligner(index, options=options)

The package now supports all BWA MEM options including paired-end specific parameters like insert size distribution (-I option).

Complete Workflow Example

Here's a complete example showing how to build an index and perform both single-end and paired-end alignments:

from bwamem import BwaIndexer, BwaAligner, Alignment, PairedAlignment

# Step 1: Build index from FASTA file
indexer = BwaIndexer(algorithm='auto')
index_path = indexer.build_index('reference.fa')
print(f'Index built at: {index_path}')

# Step 2: Create aligner with the index
aligner = BwaAligner(index_path)

# Step 3a: Single-end alignment
print("Single-end alignments:")
se_sequences = ['ACGATCGCGATCGA', 'GCTAGCTAGCTAG']
for i, seq in enumerate(se_sequences, 1):
    alignments = aligner.align(seq)
    print(f'Found {len(alignments)} alignments for sequence {i}')
    for aln in alignments:
        print(f'  {aln.rname}:{aln.pos} {aln.orient} {aln.cigar} (mapq={aln.mapq})')

# Step 3b: Paired-end alignment
print("\nPaired-end alignments:")
pe_reads = [
    ('ACGATCGCGATCGA', 'TTCGATCGATCGAT'),
    ('GCTAGCTAGCTAG', 'CGATCGATCGATC')
]
for i, (read1, read2) in enumerate(pe_reads, 1):
    paired_alignments = aligner.align(read1, read2)
    print(f'Found {len(paired_alignments)} paired alignments for read pair {i}')
    for pe_aln in paired_alignments:
        print(f'  Read1: {pe_aln.read1.rname}:{pe_aln.read1.pos} {pe_aln.read1.orient}')
        print(f'  Read2: {pe_aln.read2.rname}:{pe_aln.read2.pos} {pe_aln.read2.orient}')
        print(f'  Proper pair: {pe_aln.is_proper_pair}, Insert size: {pe_aln.insert_size}')

# Step 3c: Paired-end with custom insert size
print("\nPaired-end with custom insert size:")
pe_alignments = aligner.align('ACGATCGCGATCGA', 'TTCGATCGATCGAT', 
                             insert_size=500, insert_std=50)
print(f'Found {len(pe_alignments)} alignments with custom insert size')

Advanced Paired-End Features

# Filter for proper pairs only
proper_pairs = [pe for pe in pe_alignments if pe.is_proper_pair]
print(f'Found {len(proper_pairs)} proper pairs')

# Access individual read alignments
for pe_aln in pe_alignments:
    read1_aln = pe_aln.read1
    read2_aln = pe_aln.read2
    if read1_aln.is_primary and read2_aln.is_primary:
        print(f'Primary alignment: {read1_aln.rname}:{read1_aln.pos}-{read2_aln.pos}')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bwamem-0.0.1.tar.gz (1.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291.3 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291.3 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291.3 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (291.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file bwamem-0.0.1.tar.gz.

File metadata

  • Download URL: bwamem-0.0.1.tar.gz
  • Upload date:
  • Size: 1.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for bwamem-0.0.1.tar.gz
Algorithm Hash digest
SHA256 58b39770699145ae59095e1612bad03c7aa15db5a221f0b186c88fc3ba9495af
MD5 fd24ab355651821e835377d4429cb1b9
BLAKE2b-256 42ebd9b7adb07c3922dc275f643352f2a2262d6fc13ad851273f1054084ff696

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.1.tar.gz:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb3e9fb4eeb5c31dbbac232645f09cc4d51542c55f3daf49d2d33fef605d4cc3
MD5 3168b6b1bc8769e24da34a0b7a1b83b5
BLAKE2b-256 81f84a5c3fd4d150ec2e99c91b0a14c0998ba4e21d87f677ba504766a40c6fcb

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 32e0a778547c2a64c48f919aa54f6791a757c5ec3a3396b63a933302ecd7d972
MD5 d0f30c8d70088208b1fde89d1e4dabbd
BLAKE2b-256 69c42e45cc3fd2fcf7f1ed601694317b2eb86c38730ff0b918d7727edcf10267

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 64a0dc23c78760637fa3142132818cb9e83dd5fe7b734154fb3bd0f666515314
MD5 4f8b2200198eb99cd69f97a12546edc5
BLAKE2b-256 c4b462bbcb577d39af207e6ea6508bc0a4a6b082ad2dd0e3ebda5451ed92ae05

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fc51a143bf70dd139a4936fbab41fab3469fc136bd1387478d8153d657d75ef2
MD5 d909d8901fc81a8bff07c56615f2933c
BLAKE2b-256 c716b0a6234be3d6bf7ebf83ad28b7741bc2be4dcce15b4fe7d9e3412a84d280

See more details on using hashes here.

Provenance

The following attestation bundles were made for bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on y9c/bwamem

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page