Python tools (backed by Rust) for sequence analysis

These details have not been verified by PyPI

Project links

Project description

prseq (Python)

Python tools for sequence analysis, powered by Rust.

Overview

prseq provides Python bindings to a high-performance Rust library for FASTA and FASTQ parsing. It includes:

Pythonic API: Full type hints and Python-native data structures
CLI Tools: Ready-to-use command-line utilities
Rust Performance: Fast parsing with automatic compression detection
Memory Efficient: Streaming parsers for large files
Universal Input: Files, compressed files, and stdin support

The core parsing is implemented in the Rust prseq library.

Installation

Using uv (recommended)

uv add prseq

Using pip

pip install prseq

From source (developers)

git clone https://github.com/VirologyCharite/prseq.git
cd prseq/python
pip install maturin
maturin develop

Quick Start

Command Line Tools

# Analyze a FASTA file
fasta-info sequences.fasta
fasta-stats sequences.fasta.gz  # Works with compressed files
fasta-filter 100 sequences.fasta  # Keep sequences ≥100bp

# Analyze a FASTQ file
fastq-info reads.fastq
fastq-stats reads.fastq.bz2
fastq-filter 50 reads.fastq  # Keep sequences ≥50bp

# All tools support stdin
cat sequences.fasta | fasta-stats
gunzip -c reads.fastq.gz | fastq-filter 75

Python API

import prseq

# FASTA files
records = prseq.read_fasta("sequences.fasta")
for record in records:
    print(f"{record.id}: {len(record.sequence)} bp")

# FASTQ files
records = prseq.read_fastq("reads.fastq")
for record in records:
    print(f"{record.id}: {len(record.sequence)} bp, quality: {len(record.quality)}")

# Streaming for large files
for record in prseq.FastaReader.from_file("large.fasta"):
    if len(record.sequence) > 1000:
        print(f"Long sequence: {record.id}")

# Works with stdin too
for record in prseq.FastqReader.from_stdin():
    print(f"Read: {record.id}")

Python API Reference

FASTA Support

from prseq import FastaRecord, FastaReader, read_fasta

# FastaRecord - represents a single sequence
record = FastaRecord(id="seq1", sequence="ATCG")
print(record.id)        # "seq1"
print(record.sequence)  # "ATCG"

# Read all records into memory
records = read_fasta("file.fasta")
records = read_fasta("file.fasta.gz")  # Auto-detects compression
records = read_fasta(None)  # Read from stdin

# Stream records (memory efficient)
reader = FastaReader.from_file("large.fasta")
reader = FastaReader.from_stdin()

for record in reader:
    # Process one record at a time
    print(f"{record.id}: {len(record.sequence)}")

# Performance tuning
reader = FastaReader.from_file("file.fasta", sequence_size_hint=50000)

FASTQ Support

from prseq import FastqRecord, FastqReader, read_fastq

# FastqRecord - represents a single read
record = FastqRecord(id="read1", sequence="ATCG", quality="IIII")
print(record.id)        # "read1"
print(record.sequence)  # "ATCG"
print(record.quality)   # "IIII"

# Read all records into memory
records = read_fastq("reads.fastq")
records = read_fastq("reads.fastq.bz2")  # Auto-detects compression
records = read_fastq(None)  # Read from stdin

# Stream records (memory efficient)
reader = FastqReader.from_file("large.fastq")
reader = FastqReader.from_stdin()

for record in reader:
    # Validate quality length matches sequence
    assert len(record.sequence) == len(record.quality)
    print(f"{record.id}: {len(record.sequence)} bp")

# Performance tuning for short/long reads
reader = FastqReader.from_file("reads.fastq", sequence_size_hint=150)  # Short reads
reader = FastqReader.from_file("nanopore.fastq", sequence_size_hint=10000)  # Long reads

Advanced Usage

import prseq

# Filter sequences by length
def filter_by_length(filename, min_length):
    for record in prseq.FastaReader.from_file(filename):
        if len(record.sequence) >= min_length:
            yield record

# Calculate GC content
def gc_content(sequence):
    gc_count = sequence.upper().count('G') + sequence.upper().count('C')
    return gc_count / len(sequence) if sequence else 0

# Process compressed files
records = prseq.read_fasta("sequences.fasta.gz")
avg_gc = sum(gc_content(r.sequence) for r in records) / len(records)

# Convert FASTQ to FASTA
def fastq_to_fasta(fastq_file, fasta_file):
    with open(fasta_file, 'w') as f:
        for record in prseq.FastqReader.from_file(fastq_file):
            f.write(f">{record.id}\n{record.sequence}\n")

CLI Tools

FASTA Tools

Command	Description	Example
`fasta-info`	Show basic file information	`fasta-info sequences.fasta`
`fasta-stats`	Calculate sequence statistics	`fasta-stats sequences.fasta.gz`
`fasta-filter`	Filter by minimum length	`fasta-filter 100 sequences.fasta`

FASTQ Tools

Command	Description	Example
`fastq-info`	Show basic file information	`fastq-info reads.fastq`
`fastq-stats`	Calculate sequence statistics	`fastq-stats reads.fastq.bz2`
`fastq-filter`	Filter by minimum length	`fastq-filter 50 reads.fastq`

CLI Examples

# Basic usage
fasta-info genome.fasta
fastq-stats reads.fastq

# With compressed files (auto-detected)
fasta-stats sequences.fasta.gz
fastq-info reads.fastq.bz2

# Using stdin (great for pipelines)
cat sequences.fasta | fasta-stats
gunzip -c reads.fastq.gz | fastq-filter 100

# Performance tuning for large sequences
fasta-stats --size-hint 50000 genome.fasta
fastq-filter --size-hint 10000 150 nanopore.fastq

Development

Prerequisites

Python 3.8-3.12
Rust 1.70+
maturin for building Python extensions

Setup

cd python
pip install maturin
maturin develop

Testing

# Run all tests
python -m pytest tests/ -v

# Run integration tests
python -m pytest tests/ -v --integration

# Type checking with MyPy
mypy src/prseq

Building

# Development build
maturin develop

# Production wheel
maturin build --release

Publishing

cd python
maturin publish

Type Checking

The package includes full type hints and is configured for MyPy with Python 3.8+ compatibility. Type stubs are automatically generated for the Rust extension modules.

Rust Core

The Python package is built on top of the Rust prseq library, which provides the high-performance parsing implementation. If you need Rust-native parsing without Python, check out the Rust crate directly.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.0.33

Nov 23, 2025

0.0.32

Nov 11, 2025

0.0.31

Nov 4, 2025

0.0.30

Oct 20, 2025

0.0.29

Oct 15, 2025

0.0.28

Oct 12, 2025

0.0.27

Oct 12, 2025

0.0.26

Oct 12, 2025

0.0.25

Oct 12, 2025

0.0.24

Oct 12, 2025

0.0.23

Oct 11, 2025

0.0.22

Oct 11, 2025

0.0.19

Oct 11, 2025

0.0.18

Oct 11, 2025

This version

0.0.17

Oct 8, 2025

0.0.16

Oct 8, 2025

0.0.15

Oct 7, 2025

0.0.14

Oct 7, 2025

0.0.13

Oct 7, 2025

0.0.11

Oct 7, 2025

0.0.10

Oct 7, 2025

0.0.9

Oct 7, 2025

0.0.8

Oct 7, 2025

0.0.7

Oct 7, 2025

0.0.6

Oct 7, 2025

0.0.5

Oct 7, 2025

0.0.4

Oct 5, 2025

0.0.2

Oct 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prseq-0.0.17-cp313-cp313-macosx_11_0_arm64.whl (308.2 kB view details)

Uploaded Oct 8, 2025 CPython 3.13macOS 11.0+ ARM64

prseq-0.0.17-cp39-cp39-win_amd64.whl (221.8 kB view details)

Uploaded Oct 8, 2025 CPython 3.9Windows x86-64

prseq-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (372.6 kB view details)

Uploaded Oct 8, 2025 CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file prseq-0.0.17-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

Download URL: prseq-0.0.17-cp313-cp313-macosx_11_0_arm64.whl
Upload date: Oct 8, 2025
Size: 308.2 kB
Tags: CPython 3.13, macOS 11.0+ ARM64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prseq-0.0.17-cp313-cp313-macosx_11_0_arm64.whl
Algorithm	Hash digest
SHA256	`929a38dda904a5bf5a9efe77142a0fad5a05b36a1fff72bac4db94d129bf0cba`
MD5	`1bd6514d77b3e8028a4898859f3e4dc0`
BLAKE2b-256	`136deb276185e664a23c7f4ababe884236d69f712128df40625bf2b6860a1c02`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.17-cp313-cp313-macosx_11_0_arm64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prseq-0.0.17-cp313-cp313-macosx_11_0_arm64.whl
- Subject digest: 929a38dda904a5bf5a9efe77142a0fad5a05b36a1fff72bac4db94d129bf0cba
- Sigstore transparency entry: 592570458
- Sigstore integration time: Oct 8, 2025
Source repository:
- Permalink: VirologyCharite/prseq@15e812f1c9e84a162c3cb1100a46558297c5baf8
- Branch / Tag: refs/tags/v0.0.17
- Owner: https://github.com/VirologyCharite
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yaml@15e812f1c9e84a162c3cb1100a46558297c5baf8
- Trigger Event: push

File details

Details for the file prseq-0.0.17-cp39-cp39-win_amd64.whl.

File metadata

Download URL: prseq-0.0.17-cp39-cp39-win_amd64.whl
Upload date: Oct 8, 2025
Size: 221.8 kB
Tags: CPython 3.9, Windows x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prseq-0.0.17-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`9847d8ab61dcfeee043bebcc3e64da5ce7007c6d399797da59a86679470c827d`
MD5	`f3f45ab9fba548d56cb79ff8114992db`
BLAKE2b-256	`a476039e9ddd46cc3737dc110428876acbd8947b7c630669078e9e6b4cdd84e3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.17-cp39-cp39-win_amd64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prseq-0.0.17-cp39-cp39-win_amd64.whl
- Subject digest: 9847d8ab61dcfeee043bebcc3e64da5ce7007c6d399797da59a86679470c827d
- Sigstore transparency entry: 592570473
- Sigstore integration time: Oct 8, 2025
Source repository:
- Permalink: VirologyCharite/prseq@15e812f1c9e84a162c3cb1100a46558297c5baf8
- Branch / Tag: refs/tags/v0.0.17
- Owner: https://github.com/VirologyCharite
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yaml@15e812f1c9e84a162c3cb1100a46558297c5baf8
- Trigger Event: push

File details

Details for the file prseq-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

Download URL: prseq-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Upload date: Oct 8, 2025
Size: 372.6 kB
Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for prseq-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`917dc06735c2979ecd6ef0d334551795334c0198211681497bb3688aeadc5251`
MD5	`99848bd4e70f1761b62ea02c56f73ccd`
BLAKE2b-256	`f4d39d8e46f6fa63b0520cd0bae94377fec03bcc2e9d62fdf6360a59b7ca5749`

See more details on using hashes here.

Provenance

The following attestation bundles were made for prseq-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: workflow.yaml on VirologyCharite/prseq

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: prseq-0.0.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Subject digest: 917dc06735c2979ecd6ef0d334551795334c0198211681497bb3688aeadc5251
- Sigstore transparency entry: 592570484
- Sigstore integration time: Oct 8, 2025
Source repository:
- Permalink: VirologyCharite/prseq@15e812f1c9e84a162c3cb1100a46558297c5baf8
- Branch / Tag: refs/tags/v0.0.17
- Owner: https://github.com/VirologyCharite
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: workflow.yaml@15e812f1c9e84a162c3cb1100a46558297c5baf8
- Trigger Event: push

prseq 0.0.17

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

prseq (Python)

Overview

Installation

Using uv (recommended)

Using pip

From source (developers)

Quick Start

Command Line Tools

Python API

Python API Reference

FASTA Support

FASTQ Support

Advanced Usage

CLI Tools

FASTA Tools

FASTQ Tools

CLI Examples

Development

Prerequisites

Setup

Testing

Building

Publishing

Type Checking

Rust Core

Links

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance