Skip to main content

Pure-Python subprocess wrapper for the minibwa aligner

Project description

minibwa-py

A pure-Python, stdlib-only subprocess wrapper for the minibwa sequence aligner. It shells out to the minibwa engine binary on your PATH, parses its SAM/PAF output into lightweight typed records, and gives you a clean, Pythonic API.

Licensing in one line: this wrapper is MIT. The minibwa engine it drives is a separate program, licensed GPL-2.0-or-later for the default build, installed by you, and not bundled with this package. See Licensing.

Install

pip install minibwa-py

minibwa-py does not ship the aligner. Install the engine separately:

conda install -c bioconda minibwa

If the binary cannot be found you get a MinibwaNotFoundError whose message is:

minibwa binary not found; install with "conda install -c bioconda minibwa", set MINIBWA_BIN, or pass binary=

Quickstart

import minibwa

idx = minibwa.index("ref.fa")                  # runs the 'index' command
for aln in minibwa.map(idx, "reads.fq", preset="sr", threads=8):
    print(aln.qname, aln.flag, aln.rname, aln.pos, aln.mapq)

minibwa.version()                              # -> "0.1-r363"
read1 0 chr1 51 60
read2 0 chr1 201 60
read3 0 chr1 401 60

index() returns an Index handle that you pass straight back to map(). You can also pass any path-like index prefix directly.

Paired-end

Supply the second FASTQ as the third positional argument:

for aln in minibwa.map(idx, "R1.fq", "R2.fq", preset="sr", threads=8):
    ...

PAF output

Pass paf=True to stream PafRecord objects instead of Alignment:

for rec in minibwa.map(idx, "reads.fq", paf=True):
    print(rec.qname, rec.tname, rec.tstart, rec.tend, rec.strand, rec.identity)

Writing to a file

Pass output= to let the engine write the file itself. The call runs to completion (no iteration) and returns a pathlib.Path:

out = minibwa.map(idx, "reads.fq", output="out.sam")
print(out)  # PosixPath('out.sam')

Context-manager usage

The streaming iterators own a live subprocess. Use a with block to guarantee the child is terminated and the stderr temp file is removed even if you break out early:

with minibwa.map(idx, "reads.fq") as alns:
    for aln in alns:
        if aln.is_secondary:
            continue
        do_something(aln)

Abandoning the iterator (breaking out, then letting it be garbage-collected) also cleans up, but the context manager makes it explicit.

Reference lengths and the SAM header

In SAM mode the iterator consumes the @ header lines for you (they are never yielded as records) and keeps them. @SQ lines are parsed into a name-to-length mapping, available once iteration has passed the header:

alns = minibwa.map(idx, "reads.fq")
records = list(alns)
print(alns.reference_lengths)   # {'chr1': 600}
print(alns.header)              # the raw '@HD' / '@SQ' / '@PG' lines

Reusing a pre-built index

If the index already exists on disk -- built earlier, or by the minibwa index CLI -- wrap it with Index.from_prefix instead of rebuilding:

idx = minibwa.Index.from_prefix("ref.fa")   # no rebuild; just a handle
for aln in minibwa.map(idx, "reads.fq"):
    ...

map() also accepts a bare prefix string or any os.PathLike as its first argument, so minibwa.map("ref.fa", "reads.fq") works without a handle at all.

Records

Alignment exposes the 11 mandatory SAM fields with correct types (qname, flag, rname, pos, mapq, cigar, rnext, pnext, tlen, seq, qual), a pos0 0-based helper, flag-decoding boolean properties (is_mapped, is_reverse, is_secondary, is_supplementary, ...), and a lazily-parsed, immutable tags mapping (e.g. aln.tags["NM"]).

PafRecord exposes the 12 mandatory PAF columns (0-based half-open coordinates), is_reverse, an identity property, and the same lazy tags mapping.

SAM POS is 1-based; PAF coordinates are 0-based half-open. Each record stays faithful to its own format; nothing is silently normalized. The same read that aligns to the 51st base of chr1 reports pos == 51 as an Alignment but tstart == 50 as a PafRecord -- one locus, two conventions. Reach for Alignment.pos0 when you need the 0-based start.

Binary discovery

The engine is located in this order:

  1. an explicit binary= argument,
  2. the MINIBWA_BIN environment variable,
  3. shutil.which("minibwa").
minibwa.version(binary="/opt/minibwa/bin/minibwa")

Error model

  • MinibwaNotFoundError -- the engine binary could not be located.
  • MinibwaRunError -- the engine exited nonzero; carries .argv, .returncode, and the captured .stderr (diagnostics are never swallowed).
  • MinibwaParseError (also a ValueError) -- a SAM/PAF line could not be parsed; carries the offending .line and (when from a stream) .lineno.

All three inherit from MinibwaError, so you can catch one specifically or the whole family at once. MinibwaParseError is also a ValueError, so existing except ValueError handlers still catch malformed-line errors. For the streaming path the engine's exit status is checked at end-of-stream, so wrap the iteration:

try:
    alignments = list(minibwa.map(idx, "reads.fq"))
except minibwa.MinibwaRunError as exc:
    print("exit", exc.returncode)   # e.g. -6 (SIGABRT)
    print(exc.stderr)               # the engine's own diagnostics, verbatim
    raise

Timeouts

index(), map(), and version() all accept timeout= (seconds). For output= and version() it bounds the run-to-completion call; for the streaming path it is a deadline checked while iterating (and a bounded wait at finalize), so a stalled engine raises MinibwaRunError instead of hanging the caller forever:

for aln in minibwa.map(idx, "reads.fq", timeout=300):
    ...

Escape hatch

Any option not modeled as a keyword can be appended verbatim:

minibwa.map(idx, "reads.fq", extra_args=["--some-future-flag", "value"])

Logging

The library logs through logging.getLogger("minibwa") and installs a NullHandler, so it stays silent unless you configure logging. The full argv is logged at DEBUG.

Platform

Linux and macOS only (the engine is POSIX/conda-only). Windows is out of scope.

Licensing

The Python wrapper code in this repository is licensed under the MIT License (see LICENSE).

The minibwa engine is a separate work with its own license (GPL-2.0-or-later for the default build). This package does not include, bundle, or statically link any engine code -- it only invokes the engine binary that you install yourself. Your use of the engine is governed by the engine's own license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minibwa_py-0.1.0.tar.gz (49.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minibwa_py-0.1.0-py3-none-any.whl (25.8 kB view details)

Uploaded Python 3

File details

Details for the file minibwa_py-0.1.0.tar.gz.

File metadata

  • Download URL: minibwa_py-0.1.0.tar.gz
  • Upload date:
  • Size: 49.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for minibwa_py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ff32f82c83eb694885d6843dedfc780b1cefe9b844f574a82f8c01d8fbd88954
MD5 fe10774556b5923fc4f035a4584f8d32
BLAKE2b-256 a8fafb381c0dc0b24e212de59a73e678dd36a36effbe2716661e32214a6f7a2b

See more details on using hashes here.

File details

Details for the file minibwa_py-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: minibwa_py-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for minibwa_py-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 df18facd3b7e5be218cf4feb7171b6d996882a0754b5acd1d313d10aab859972
MD5 86d5f8a25ce1a11b3eb2716c6ccf4e12
BLAKE2b-256 d92adde05d7add078bc4351549b2f8933c42b607d6cdccd150ff418a3bfb4659

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page