Skip to main content

High-performance FASTQ parsing with Mojo-backed Python bindings

Project description

blazeseq (Python)

Python bindings for BlazeSeq — high-performance FASTQ parsing.

Wheels only: install from PyPI. No source build of the extension.

# Install (uv recommended)
uv pip install blazeseq

# Or with pip
pip install blazeseq

Quick start

import blazeseq

# quality_schema defaults to "generic"; use keyword args for clarity
parser = blazeseq.parser("file.fastq", quality_schema="sanger")
while parser.has_more():
    rec = parser.next_record()
    print(rec.id, rec.sequence)

Or use the iterator over records:

for rec in parser.records:
    print(rec.id, rec.sequence)

For batched iteration (default 100 records per batch):

for batch in parser.batches:
    for rec in batch:
        print(rec.id, rec.sequence)

Custom batch size:

for batch in parser.batches_with_size(50):
    for rec in batch:
        print(rec.id, rec.sequence)

Gzip files: use .fastq.gz or .fq.gz and set parallelism for decompression threads (default 4):

parser = blazeseq.parser("reads.fastq.gz", quality_schema="sanger", parallelism=8)
for rec in parser.records:
    print(rec.id, rec.sequence)

API reference

Module-level

Function Description
parser(path, quality_schema="generic", parallelism=4) Create a FASTQ parser. Supports .fastq, .fq, .fastq.gz, .fq.gz. quality_schema: "generic", "sanger", "solexa", "illumina_1.3", "illumina_1.5", "illumina_1.8". parallelism: decompression threads for gzip (default 4). Returns a parser supporting records, batches, batches_with_size(n), has_more(), next_record(), next_batch(n).

Parser (returned by parser / create_parser)

Method / attribute Description
has_more() Return True if there may be more records to read.
next_record() Return the next record as a FastqRecord. Raises on EOF or parse error.
next_ref_as_record() Return the next record (from zero-copy ref) as a FastqRecord. Raises on EOF or parse error.
next_batch(max_records) Return a batch of up to max_records records as a FastqBatch. Returns a partial batch at EOF.
records Iterable over records: for rec in parser.records.
batches Iterable over batches (default 100 records per batch): for batch in parser.batches then for rec in batch.
batches_with_size(batch_size) Iterable over batches of the given size.
__iter__ / __next__ Iterator protocol; equivalent to iterating over records.

FastqRecord

Property / method Description
id Read identifier (without leading @).
sequence Sequence line (bases).
quality Quality line (raw quality string).
__len__() Sequence length (number of bases).
phred_scores Phred quality scores as a Python list of integers.

FastqBatch

Method Description
num_records() Number of records in the batch.
get_record(index) Return the record at the given index as a FastqRecord.
__iter__ Iterate over records: for rec in batch.

Local development (uv)

From the repo root, after building the Mojo extension into python/blazeseq/_extension/:

uv pip install -e python/
uv run python tests/test_python_bindings.py

Build and upload to PyPI

Prerequisites: Build the Mojo extension so that python/blazeseq/_extension/ contains the platform wheel (.so). Ensure version in pyproject.toml is bumped for releases.

  1. Install build tools and twine:

    uv pip install build twine
    
  2. Build the package (from the python/ directory):

    cd python
    uv run python -m build
    

    This produces dist/blazeseq-<version>.tar.gz (sdist) and dist/blazeseq-<version>-*.whl (wheel).

  3. Upload to PyPI (use a PyPI API token; create one at pypi.org):

    uv run twine upload dist/*
    

    For Test PyPI first:

    uv run twine upload --repository testpypi dist/*
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blazeseq-0.3.0-py3-none-any.whl (133.5 kB view details)

Uploaded Python 3

File details

Details for the file blazeseq-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: blazeseq-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 133.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for blazeseq-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 95f317d88e52b5ee1c6d319b961d38c856589f7275530124c4cc5b0ed3370771
MD5 0f2e2779f38385e8ce4ce68743e6481b
BLAKE2b-256 b6c8917ab12d114931f5b6042eb7e31982a603d9bc6165b7f88cb2cbed162a2e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page