Skip to main content

Python tools (backed by Rust) for sequence analysis

Project description

prseq (Python)

Python tools for sequence analysis, powered by Rust.

PyPI Python Version Build Status Downloads License: MIT

Overview

prseq provides Python bindings to a high-performance Rust library for FASTA and FASTQ parsing. It includes:

  • Pythonic API: Full type hints and Python-native data structures
  • CLI Tools: Ready-to-use command-line utilities
  • Rust Performance: Fast parsing with automatic compression detection
  • Memory Efficient: Streaming parsers for large files
  • Universal Input: Files, compressed files, and stdin support

The core parsing is implemented in the Rust prseq library.

Installation

Using uv (recommended)

uv add prseq

Using pip

pip install prseq

From source (developers)

git clone https://github.com/VirologyCharite/prseq.git
cd prseq/python
pip install maturin
maturin develop

Quick Start

Command Line Tools

# Analyze a FASTA file
fasta-info sequences.fasta
fasta-stats sequences.fasta.gz  # Works with compressed files
fasta-filter 100 sequences.fasta  # Keep sequences ≥100bp

# Analyze a FASTQ file
fastq-info reads.fastq
fastq-stats reads.fastq.bz2
fastq-filter 50 reads.fastq  # Keep sequences ≥50bp

# All tools support stdin
cat sequences.fasta | fasta-stats
gunzip -c reads.fastq.gz | fastq-filter 75

Python API

import prseq

# FASTA files
records = prseq.read_fasta("sequences.fasta")
for record in records:
    print(f"{record.id}: {len(record.sequence)} bp")

# FASTQ files
records = prseq.read_fastq("reads.fastq")
for record in records:
    print(f"{record.id}: {len(record.sequence)} bp, quality: {len(record.quality)}")

# Streaming for large files
for record in prseq.FastaReader.from_file("large.fasta"):
    if len(record.sequence) > 1000:
        print(f"Long sequence: {record.id}")

# Works with stdin too
for record in prseq.FastqReader.from_stdin():
    print(f"Read: {record.id}")

Python API Reference

FASTA Support

from prseq import FastaRecord, FastaReader, read_fasta

# FastaRecord - represents a single sequence
record = FastaRecord(id="seq1", sequence="ATCG")
print(record.id)        # "seq1"
print(record.sequence)  # "ATCG"

# Read all records into memory
records = read_fasta("file.fasta")
records = read_fasta("file.fasta.gz")  # Auto-detects compression
records = read_fasta(None)  # Read from stdin

# Stream records (memory efficient)
reader = FastaReader.from_file("large.fasta")
reader = FastaReader.from_stdin()

for record in reader:
    # Process one record at a time
    print(f"{record.id}: {len(record.sequence)}")

# Performance tuning
reader = FastaReader.from_file("file.fasta", sequence_size_hint=50000)

FASTQ Support

from prseq import FastqRecord, FastqReader, read_fastq

# FastqRecord - represents a single read
record = FastqRecord(id="read1", sequence="ATCG", quality="IIII")
print(record.id)        # "read1"
print(record.sequence)  # "ATCG"
print(record.quality)   # "IIII"

# Read all records into memory
records = read_fastq("reads.fastq")
records = read_fastq("reads.fastq.bz2")  # Auto-detects compression
records = read_fastq(None)  # Read from stdin

# Stream records (memory efficient)
reader = FastqReader.from_file("large.fastq")
reader = FastqReader.from_stdin()

for record in reader:
    # Validate quality length matches sequence
    assert len(record.sequence) == len(record.quality)
    print(f"{record.id}: {len(record.sequence)} bp")

# Performance tuning for short/long reads
reader = FastqReader.from_file("reads.fastq", sequence_size_hint=150)  # Short reads
reader = FastqReader.from_file("nanopore.fastq", sequence_size_hint=10000)  # Long reads

Advanced Usage

import prseq

# Filter sequences by length
def filter_by_length(filename, min_length):
    for record in prseq.FastaReader.from_file(filename):
        if len(record.sequence) >= min_length:
            yield record

# Calculate GC content
def gc_content(sequence):
    gc_count = sequence.upper().count('G') + sequence.upper().count('C')
    return gc_count / len(sequence) if sequence else 0

# Process compressed files
records = prseq.read_fasta("sequences.fasta.gz")
avg_gc = sum(gc_content(r.sequence) for r in records) / len(records)

# Convert FASTQ to FASTA
def fastq_to_fasta(fastq_file, fasta_file):
    with open(fasta_file, 'w') as f:
        for record in prseq.FastqReader.from_file(fastq_file):
            f.write(f">{record.id}\n{record.sequence}\n")

CLI Tools

FASTA Tools

Command Description Example
fasta-info Show basic file information fasta-info sequences.fasta
fasta-stats Calculate sequence statistics fasta-stats sequences.fasta.gz
fasta-filter Filter by minimum length fasta-filter 100 sequences.fasta

FASTQ Tools

Command Description Example
fastq-info Show basic file information fastq-info reads.fastq
fastq-stats Calculate sequence statistics fastq-stats reads.fastq.bz2
fastq-filter Filter by minimum length fastq-filter 50 reads.fastq

CLI Examples

# Basic usage
fasta-info genome.fasta
fastq-stats reads.fastq

# With compressed files (auto-detected)
fasta-stats sequences.fasta.gz
fastq-info reads.fastq.bz2

# Using stdin (great for pipelines)
cat sequences.fasta | fasta-stats
gunzip -c reads.fastq.gz | fastq-filter 100

# Performance tuning for large sequences
fasta-stats --size-hint 50000 genome.fasta
fastq-filter --size-hint 10000 150 nanopore.fastq

Development

Prerequisites

  • Python 3.8-3.12
  • Rust 1.70+
  • maturin for building Python extensions

Setup

cd python
pip install maturin
maturin develop

Testing

# Run all tests
python -m pytest tests/ -v

# Run integration tests
python -m pytest tests/ -v --integration

# Type checking with MyPy
mypy src/prseq

Building

# Development build
maturin develop

# Production wheel
maturin build --release

Publishing

cd python
maturin publish

Type Checking

The package includes full type hints and is configured for MyPy with Python 3.8+ compatibility. Type stubs are automatically generated for the Rust extension modules.

Rust Core

The Python package is built on top of the Rust prseq library, which provides the high-performance parsing implementation. If you need Rust-native parsing without Python, check out the Rust crate directly.

Links

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

prseq-0.0.22-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (378.4 kB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

prseq-0.0.22-cp313-cp313-win_amd64.whl (226.3 kB view details)

Uploaded CPython 3.13Windows x86-64

prseq-0.0.22-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (378.1 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

prseq-0.0.22-cp313-cp313-macosx_11_0_arm64.whl (316.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

prseq-0.0.22-cp312-cp312-win_amd64.whl (226.5 kB view details)

Uploaded CPython 3.12Windows x86-64

prseq-0.0.22-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (377.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

prseq-0.0.22-cp312-cp312-macosx_11_0_arm64.whl (316.8 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

prseq-0.0.22-cp311-cp311-win_amd64.whl (226.3 kB view details)

Uploaded CPython 3.11Windows x86-64

prseq-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (378.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

prseq-0.0.22-cp311-cp311-macosx_11_0_arm64.whl (319.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

prseq-0.0.22-cp310-cp310-win_amd64.whl (226.5 kB view details)

Uploaded CPython 3.10Windows x86-64

prseq-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (378.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

prseq-0.0.22-cp310-cp310-macosx_11_0_arm64.whl (319.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

File details

Details for the file prseq-0.0.22-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 44a91431a5df537cd1f2652024c89982c4afd048c6aff704e00777a3d2566a4f
MD5 e4f45c4e9c6f59062940a585636357a9
BLAKE2b-256 3a7fa740fc50fe958be5ba031dafd861caf51ca59abccae408ef2bead9e4c238

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: prseq-0.0.22-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 226.3 kB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prseq-0.0.22-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 9737e9562ad84c012aff7fe8431c0c934cb0948f90ad59741520fbb25bfddd04
MD5 b404f1c7d7ba0fa12a6ed54e3256f119
BLAKE2b-256 434ac22a2afeb7c0a97dabcb59f468f0ea3391e217b0270d1c86a545ccbd0bf2

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp313-cp313-win_amd64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1133901e7af5ce151bcdcea4cb4064e5e10011f868182767aea211584b90839c
MD5 af664f1328efd7f5441cc1e0527a0412
BLAKE2b-256 50055d0e2f794c75bd831fbb0f97a8493bcd7156e8d49ceb86f4d5d7681fdfc5

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 150aacb6c6db9e075f9919b21e493bf275dc04dc5e3288b96b0f9421c491cdfc
MD5 ce7649e4b552e1b1156046a7999f86ee
BLAKE2b-256 b1c1260a9bb109751bfc774459674451d37e1f2179563f5f9379ed225464f109

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: prseq-0.0.22-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 226.5 kB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prseq-0.0.22-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 14f04b7f98fcb958cb2cc3329be15cae289d08b17a344d4be29cdaf3d64c2952
MD5 6ffd7fe2b756a773671bf9403505ecf7
BLAKE2b-256 8a68054081ee0dfb39ffb67c7f4836f959516d23d31b013e86b96771c337450c

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp312-cp312-win_amd64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 84749bdf7e69eca9514a7e954de32d815141c70948f60dcc805a4f35f939c287
MD5 5aa4d2d02d4f7c98ccd571dcfdc616bf
BLAKE2b-256 aba4798c4f76541c41be33732b49a9fce501a49983a8f0dcf93fb3fc25682d2b

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b29874472f867f632620357155a2bdde984dca79b77a7d052503f9513e3d57fb
MD5 b1f95977008984c9fda333708c9f29cf
BLAKE2b-256 3d723ebf7f95147e0b17d5b06133c321a08955dbe931a92c8d1878956a3716e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: prseq-0.0.22-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 226.3 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prseq-0.0.22-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 fab660bc8490eb040b74487a2fcd5a43956e9081327a6737023564561c6cfa96
MD5 d6463780d5c18530b64e5130ad6d4a83
BLAKE2b-256 56539ff1663edd63a17dfeba39dcb66b7ece50404ed824bfbcfad311c85069ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp311-cp311-win_amd64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 26b2ae05981363a93fa119f3c2a78c2d28df285c4fd7e48f7c596d69ed06cff7
MD5 d201eacf6a8ccd47cca591224d4a2603
BLAKE2b-256 d0ea82ded963267ceeacfd67563fc0ec9ab5f0e1d727cda31dcf9d2d73238a80

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d6f24a8d21fcacf8efa26680f2b8800154aeba7e5d0710da3b3b6bdc245a9089
MD5 d788716ba201cbbcf11c5d9e0d162c7e
BLAKE2b-256 c44dbc8fe8d38b2e0aa4acce4ac2a61eff169f9b4cbbe5b7010558764660de91

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: prseq-0.0.22-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 226.5 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prseq-0.0.22-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 1cda84786ec476dadb727d60e7aad888dabd9d04b39048a9386a1a59ca0477ad
MD5 edec9be2523fc7c32b3f18bf51de637b
BLAKE2b-256 d399e347175723ccfc64942919203e12f91679f87f56e235ec1e5b6ecad4ceee

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp310-cp310-win_amd64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9934364e211fe844493b85f0cec3f5bbe89a58f1747d1b836f902fed5242086a
MD5 ba26ab61bee57aa8ad5bbfa18388fd7f
BLAKE2b-256 7f9852352c646b273fdd9a0755e8fb2a777f81f71bdc5500fffba5de05319597

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prseq-0.0.22-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for prseq-0.0.22-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1f9c145fa336e771dd28961a134caba2138337e6aa4a8802889eb43f900b6879
MD5 ca9f96e697e31424fa91ada839008374
BLAKE2b-256 6fd87645557c7ab4b53a3830c4b74419cdd1cf77803d94e051acd53155769da9

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.22-cp310-cp310-macosx_11_0_arm64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page