Bindings to bwa aligner
Project description
bwamem
Python bindings to bwa mem aligner; sufficient to load and index and perform
alignments of sequences to the index to obtain basic statistics.
These python bindings are licensed under Mozilla Public License 2.0, bwa is licenced under GNU General Public License v3.0.
Documentation can be found at https://y9c.github.io/bwamem/.
Installation
The git source repository contains bwa as a submodule. The repository should therefore be cloned using the recursive option.
The package setup.py script requires libbwa.a to have been built in the submodule
directory before running. This can be performed via the libbwa.a target, which first
makes some amendments to the bwa/Makefile. To build and install the package one should
therefore run:
git clone --recursive https://github.com/y9c/bwamem.git
cd bwamem
make bwa/libbwa.a
python setup.py install
Building BWA Indexes
The BwaIndexer class provides a pythonic interface to build BWA indexes from
FASTA files. It supports different BWT construction algorithms:
from bwamem import BwaIndexer
# Create indexer with default settings (auto algorithm)
indexer = BwaIndexer()
# Build index from FASTA file
index_path = indexer.build_index('reference.fa')
print(f'Index built at: {index_path}')
# Use specific algorithm
indexer = BwaIndexer(algorithm='is') # or 'rb2', 'bwtsw', 'auto'
index_path = indexer.build_index('reference.fa', prefix='my_index')
Available algorithms:
auto: Automatically choose algorithm based on genome sizerb2: RB2 algorithm (good for medium genomes)bwtsw: BWT-SW algorithm (good for large genomes)is: IS algorithm (good for small genomes)
Performing Alignments
The BwaAligner class provides a pythonic interface to bwa mem aligner. It
takes as input a bwa index fileset on construction and can then be used to find
alignments of sequences given as strings.
Single-End Alignment
For single-end reads, use the align() method with one sequence:
from bwamem import BwaAligner, Alignment
index = 'path/to/index' # the path given to bwa index
seq = 'ACGATCGCGATCGA'
aligner = BwaAligner(index)
alignments = aligner.align(seq) # Returns tuple of Alignment objects
print('Found {} alignments.'.format(len(alignments)))
for aln in alignments:
print(f' {aln.rname}:{aln.pos} {aln.orient} {aln.cigar} (mapq={aln.mapq}, score={aln.score})')
Paired-End Alignment
For paired-end reads, use the align() method with two sequences:
from bwamem import BwaAligner, PairedAlignment
index = 'path/to/index'
read1 = 'ACGATCGCGATCGA'
read2 = 'TTCGATCGATCGAT'
aligner = BwaAligner(index)
paired_alignments = aligner.align(read1, read2) # Returns tuple of PairedAlignment objects
print('Found {} paired alignments.'.format(len(paired_alignments)))
for pe_aln in paired_alignments:
print(f' Read1: {pe_aln.read1.rname}:{pe_aln.read1.pos} {pe_aln.read1.orient}')
print(f' Read2: {pe_aln.read2.rname}:{pe_aln.read2.pos} {pe_aln.read2.orient}')
print(f' Proper pair: {pe_aln.is_proper_pair}, Insert size: {pe_aln.insert_size}')
Custom Insert Size Distribution
For paired-end reads, you can specify the expected insert size distribution:
# With custom insert size parameters
paired_alignments = aligner.align(read1, read2, insert_size=500, insert_std=50)
Data Structures
Alignment (for single-end reads):
Alignment(rname='chr1', orient='+', pos=1000, mapq=60, cigar='100M', NM=0, score=100, is_primary=True)
PairedAlignment (for paired-end reads):
PairedAlignment(read1=Alignment(...), read2=Alignment(...), is_proper_pair=True, insert_size=500)
Alignment Parameters
Alignment parameters can be given as they are on the bwa mem command line:
from bwamem import BwaAligner
index = 'path/to/index'
options = '-x ont2d -A 1 -B 0'
aligner = BwaAligner(index, options=options)
The package now supports all BWA MEM options including paired-end specific parameters like insert size distribution (-I option).
Complete Workflow Example
Here's a complete example showing how to build an index and perform both single-end and paired-end alignments:
from bwamem import BwaIndexer, BwaAligner, Alignment, PairedAlignment
# Step 1: Build index from FASTA file
indexer = BwaIndexer(algorithm='auto')
index_path = indexer.build_index('reference.fa')
print(f'Index built at: {index_path}')
# Step 2: Create aligner with the index
aligner = BwaAligner(index_path)
# Step 3a: Single-end alignment
print("Single-end alignments:")
se_sequences = ['ACGATCGCGATCGA', 'GCTAGCTAGCTAG']
for i, seq in enumerate(se_sequences, 1):
alignments = aligner.align(seq)
print(f'Found {len(alignments)} alignments for sequence {i}')
for aln in alignments:
print(f' {aln.rname}:{aln.pos} {aln.orient} {aln.cigar} (mapq={aln.mapq})')
# Step 3b: Paired-end alignment
print("\nPaired-end alignments:")
pe_reads = [
('ACGATCGCGATCGA', 'TTCGATCGATCGAT'),
('GCTAGCTAGCTAG', 'CGATCGATCGATC')
]
for i, (read1, read2) in enumerate(pe_reads, 1):
paired_alignments = aligner.align(read1, read2)
print(f'Found {len(paired_alignments)} paired alignments for read pair {i}')
for pe_aln in paired_alignments:
print(f' Read1: {pe_aln.read1.rname}:{pe_aln.read1.pos} {pe_aln.read1.orient}')
print(f' Read2: {pe_aln.read2.rname}:{pe_aln.read2.pos} {pe_aln.read2.orient}')
print(f' Proper pair: {pe_aln.is_proper_pair}, Insert size: {pe_aln.insert_size}')
# Step 3c: Paired-end with custom insert size
print("\nPaired-end with custom insert size:")
pe_alignments = aligner.align('ACGATCGCGATCGA', 'TTCGATCGATCGAT',
insert_size=500, insert_std=50)
print(f'Found {len(pe_alignments)} alignments with custom insert size')
Advanced Paired-End Features
# Filter for proper pairs only
proper_pairs = [pe for pe in pe_alignments if pe.is_proper_pair]
print(f'Found {len(proper_pairs)} proper pairs')
# Access individual read alignments
for pe_aln in pe_alignments:
read1_aln = pe_aln.read1
read2_aln = pe_aln.read2
if read1_aln.is_primary and read2_aln.is_primary:
print(f'Primary alignment: {read1_aln.rname}:{read1_aln.pos}-{read2_aln.pos}')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bwamem-0.0.1.tar.gz.
File metadata
- Download URL: bwamem-0.0.1.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58b39770699145ae59095e1612bad03c7aa15db5a221f0b186c88fc3ba9495af
|
|
| MD5 |
fd24ab355651821e835377d4429cb1b9
|
|
| BLAKE2b-256 |
42ebd9b7adb07c3922dc275f643352f2a2262d6fc13ad851273f1054084ff696
|
Provenance
The following attestation bundles were made for bwamem-0.0.1.tar.gz:
Publisher:
publish.yml on y9c/bwamem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bwamem-0.0.1.tar.gz -
Subject digest:
58b39770699145ae59095e1612bad03c7aa15db5a221f0b186c88fc3ba9495af - Sigstore transparency entry: 646515345
- Sigstore integration time:
-
Permalink:
y9c/bwamem@00c6142a0e777be4ec066c59076606e14d298128 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/y9c
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@00c6142a0e777be4ec066c59076606e14d298128 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 291.3 kB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb3e9fb4eeb5c31dbbac232645f09cc4d51542c55f3daf49d2d33fef605d4cc3
|
|
| MD5 |
3168b6b1bc8769e24da34a0b7a1b83b5
|
|
| BLAKE2b-256 |
81f84a5c3fd4d150ec2e99c91b0a14c0998ba4e21d87f677ba504766a40c6fcb
|
Provenance
The following attestation bundles were made for bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on y9c/bwamem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bwamem-0.0.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
eb3e9fb4eeb5c31dbbac232645f09cc4d51542c55f3daf49d2d33fef605d4cc3 - Sigstore transparency entry: 646515427
- Sigstore integration time:
-
Permalink:
y9c/bwamem@00c6142a0e777be4ec066c59076606e14d298128 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/y9c
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@00c6142a0e777be4ec066c59076606e14d298128 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 291.3 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32e0a778547c2a64c48f919aa54f6791a757c5ec3a3396b63a933302ecd7d972
|
|
| MD5 |
d0f30c8d70088208b1fde89d1e4dabbd
|
|
| BLAKE2b-256 |
69c42e45cc3fd2fcf7f1ed601694317b2eb86c38730ff0b918d7727edcf10267
|
Provenance
The following attestation bundles were made for bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on y9c/bwamem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bwamem-0.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
32e0a778547c2a64c48f919aa54f6791a757c5ec3a3396b63a933302ecd7d972 - Sigstore transparency entry: 646515374
- Sigstore integration time:
-
Permalink:
y9c/bwamem@00c6142a0e777be4ec066c59076606e14d298128 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/y9c
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@00c6142a0e777be4ec066c59076606e14d298128 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 291.3 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64a0dc23c78760637fa3142132818cb9e83dd5fe7b734154fb3bd0f666515314
|
|
| MD5 |
4f8b2200198eb99cd69f97a12546edc5
|
|
| BLAKE2b-256 |
c4b462bbcb577d39af207e6ea6508bc0a4a6b082ad2dd0e3ebda5451ed92ae05
|
Provenance
The following attestation bundles were made for bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on y9c/bwamem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bwamem-0.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
64a0dc23c78760637fa3142132818cb9e83dd5fe7b734154fb3bd0f666515314 - Sigstore transparency entry: 646515398
- Sigstore integration time:
-
Permalink:
y9c/bwamem@00c6142a0e777be4ec066c59076606e14d298128 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/y9c
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@00c6142a0e777be4ec066c59076606e14d298128 -
Trigger Event:
push
-
Statement type:
File details
Details for the file bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 291.3 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc51a143bf70dd139a4936fbab41fab3469fc136bd1387478d8153d657d75ef2
|
|
| MD5 |
d909d8901fc81a8bff07c56615f2933c
|
|
| BLAKE2b-256 |
c716b0a6234be3d6bf7ebf83ad28b7741bc2be4dcce15b4fe7d9e3412a84d280
|
Provenance
The following attestation bundles were made for bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish.yml on y9c/bwamem
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bwamem-0.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
fc51a143bf70dd139a4936fbab41fab3469fc136bd1387478d8153d657d75ef2 - Sigstore transparency entry: 646515453
- Sigstore integration time:
-
Permalink:
y9c/bwamem@00c6142a0e777be4ec066c59076606e14d298128 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/y9c
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@00c6142a0e777be4ec066c59076606e14d298128 -
Trigger Event:
push
-
Statement type: