Skip to main content

No project description provided

Project description

raidx

High-performance FASTA file reader with Python bindings

raidx is a drop-in replacement for pyfaidx implemented in Rust, providing 2-4x faster performance for FASTA file operations while maintaining full API compatibility.

⚡ Performance

raidx is fast:

Operation pyfaidx (ms) raidx (ms) Speedup
🚀 File Opening 0.254 0.068 3.72x faster
🧬 Sequence Access 0.252 0.061 4.13x faster
✂️ Sequence Slicing 0.259 0.077 3.35x faster
🔍 get_seq Method 0.268 0.071 3.76x faster
🔄 Reverse Complement 0.287 0.071 4.03x faster
🔁 Sequence Iteration 0.299 0.097 3.08x faster
🎯 Random Access 3.403 1.172 2.90x faster

📊 Benchmarked on the hg38 human genome assembly with 1000 iterations per test

Installation

pip install .

Quick Start

raidx provides the same API as pyfaidx:

>>> from raidx import Fasta
>>> genome = Fasta('genome.fasta')
>>> genome
Fasta("genome.fasta")

# Access sequences like a dictionary
>>> genome['chr1'][1000:1100]  
>chr1:1001-1100
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC...

# Get sequence metadata
>>> seq = genome['chr1'][1000:1100]
>>> seq.name
'chr1'
>>> seq.start  # 1-based
1001
>>> seq.end    # 0-based  
1100

# String-like operations
>>> genome['chr1'][1000:1100].complement
>chr1 (complement):1001-1100
TACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACGTACG...

>>> -genome['chr1'][1000:1100]  # reverse complement
>chr1 (complement):1100-1001
GCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCAT...

# Method-based access
>>> genome.get_seq('chr1', 1001, 1100)
>chr1:1001-1100
ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC...

# Iteration
>>> for record in genome:
...     print(f"{record.name}: {len(record)} bp")
chr1: 248956422 bp
chr2: 242193529 bp
...

Key Features

  • Drop-in replacement for pyfaidx - same API, same behavior
  • Memory-mapped I/O for efficient file access
  • Rust performance with Python convenience
  • Full compatibility with existing pyfaidx code
  • Comprehensive indexing (.fai files compatible with samtools)
  • Rich sequence objects with metadata and methods
  • String-like operations (slicing, reverse, complement)

API Compatibility

raidx implements the complete pyfaidx API:

# All pyfaidx features work identically
from raidx import Fasta

# Indexing and slicing  
genome = Fasta('genome.fasta')
genome['chr1'][1000:2000]
genome[0][:100]  # First sequence, first 100 bp

# Sequence operations
seq = genome['chr1'][1000:1100]
seq.complement
seq.reverse  
-seq  # reverse complement

# Method calls
genome.get_seq('chr1', 1000, 2000)
genome.keys()
len(genome)

# Iteration
for record in genome:
    print(record.name, len(record))

Benchmarking

raidx includes two benchmarking approaches for different use cases:

pytest-benchmark

Use the organized benchmarks/ directory with pytest-benchmark for development, CI/CD, and detailed performance analysis:

# Install benchmark dependencies
pip install -e ".[benchmark]"

# Run all benchmarks
pytest benchmarks/

# Run specific benchmark categories
pytest benchmarks/benchmark_file_ops.py      # File operations
pytest benchmarks/benchmark_sequence_ops.py  # Sequence operations

# Save and compare results
pytest benchmarks/ --benchmark-save=baseline
pytest benchmarks/ --benchmark-compare=baseline

Standalone Benchmarks

Use the small benchmark tool for quick performance comparisons on your own files:

# Benchmark your files
python benchmark_raidx.py your_genome.fasta

# Adjust benchmarking details
python benchmark_raidx.py genome.fasta --iterations 1000 --random-access 500

Why raidx? raidx provides the same familiar pyfaidx interface, but with the performance of Rust underneath. Perfect for the pipelines that need to scale.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

raidx-0.1.0.tar.gz (103.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

raidx-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (255.1 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

File details

Details for the file raidx-0.1.0.tar.gz.

File metadata

  • Download URL: raidx-0.1.0.tar.gz
  • Upload date:
  • Size: 103.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for raidx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 445604d658ec38c802dd021e1e033fd99c9837cfe12c980dee93ad444249d25b
MD5 dbc58e8c24c89ca1d543759363e8a301
BLAKE2b-256 5600f5292eb270ec399a282bf8c4d00b3db15950f8f3a15c9094ee0f63799faa

See more details on using hashes here.

File details

Details for the file raidx-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for raidx-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba8c25fc375d03100e5529daab1bc7325225fa5f435054ad3900e431942827ea
MD5 0a30ff18542b5180fea9fa75daf34c6f
BLAKE2b-256 92c2dbfe652de07638455f2bc4d21a149bfc11ed7ba5e4804b5057282feca97f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page