Skip to main content

Efficient random access to subsequences in large FASTA files

Project description

fastaccess

Efficient random access to subsequences in FASTA files using byte-level seeking.

Installation

pip install fastaccess

From source (includes C++ backend for better performance):

pip install -e .

The C++ backend requires a C++17 compiler and CMake 3.15+. If unavailable, falls back to pure Python.

Quick Start

from fastaccess import FastaStore

fa = FastaStore("genome.fa")  # Builds index, caches for next time
seq = fa.fetch("chr1", 1000, 2000)  # 1-based inclusive coordinates

API

FastaStore(path, use_cache=True, cache_dir=None)

  • path: Path to FASTA file (plain or gzip-compressed .fa.gz)
  • use_cache: Save/load index from .fidx cache file
  • cache_dir: Custom directory for cache file (useful for read-only FASTA directories)

Methods

Method Description
fetch(name, start, stop, reverse_complement=False) Fetch subsequence (1-based inclusive)
fetch_many(queries) Batch fetch list of (name, start, stop) tuples
list_sequences() Get all sequence names
get_length(name) Get sequence length
get_description(name) Get FASTA header description
get_info(name) Get dict with name, description, length
rebuild_index() Force rebuild index and update cache
is_cached() Check if loaded from cache
cache_exists() Check if cache file exists
get_cache_path() Get cache file path
delete_cache() Delete cache file

Errors

  • KeyError: Sequence name not found
  • ValueError: Invalid coordinates (start < 1, stop < start, stop > length)

Features

  • Random access: Uses seek() to fetch only required bytes
  • Index caching: 7-40x faster reloading via .fidx cache files
  • Gzip support: Reads .fa.gz files directly
  • 1-based inclusive coordinates: Standard bioinformatics convention
  • Format support: Wrapped/unwrapped sequences, Unix/Windows line endings
  • Uppercase output: All sequences returned uppercase

Performance

C++ Backend

Operation Python C++ Speedup
Index build (10MB) 70 ms 5 ms 13x
Reverse complement (8 KB) 0.21 ms 0.015 ms 14x
Small fetch (100 bp) 0.017 ms 0.017 ms 1x
Large fetch (100 KB) 0.36 ms 0.35 ms 1x

Check if C++ backend is active:

from fastaccess import using_cpp_backend
print(using_cpp_backend())  # True if available

Index Caching

Human genome (3 GB):
  First load:  ~2 seconds (builds index)
  With cache:  0.05 seconds (40x faster)

Cache is automatically invalidated when the FASTA file changes.

Example

from fastaccess import FastaStore

fa = FastaStore("hg38.fa")

# Get sequence info
print(fa.list_sequences())  # ["chr1", "chr2", ...]
print(fa.get_length("chr1"))  # 248956422

# Fetch regions
seq = fa.fetch("chr1", 1000, 2000)
rc = fa.fetch("chr1", 1000, 2000, reverse_complement=True)

# Batch fetch
regions = [("chr1", 1, 100), ("chr2", 500, 600)]
sequences = fa.fetch_many(regions)

Requirements

  • Python 3.8+
  • No runtime dependencies (pure Python fallback always works)

C++ backend (optional):

  • C++17 compiler
  • CMake 3.15+

Limitations

  • ASCII sequences only (DNA/RNA)
  • Gzip files require full decompression (no random access within compressed data)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastaccess-0.2.1.tar.gz (29.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastaccess-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139.9 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

fastaccess-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (140.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

fastaccess-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139.1 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

fastaccess-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139.3 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

fastaccess-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (139.0 kB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file fastaccess-0.2.1.tar.gz.

File metadata

  • Download URL: fastaccess-0.2.1.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fastaccess-0.2.1.tar.gz
Algorithm Hash digest
SHA256 55ccf7e23a24693c8c38900c2f356eddbca860a37230d882eee27bda8bfc2b65
MD5 4638e66c1b01bd21db5d3559e173f8de
BLAKE2b-256 05e8b6c4ae10f962848541748dd0c8ac4c3df5e6bf3f366b35486acf9263cd0c

See more details on using hashes here.

File details

Details for the file fastaccess-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastaccess-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 44c094eb71a812b490ca6865b0339d0c9546af83cca1871b2c8c52678c3ad22c
MD5 a11f59164b09eb643676ca84e2288750
BLAKE2b-256 2ec715b3c3763a0b2e42b37ede39786021cd81c2f970faece2fa843e50e16d20

See more details on using hashes here.

File details

Details for the file fastaccess-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastaccess-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f32837792472668da196a530e455dad5a1804253d760fbc78e089413dc8d38d8
MD5 612389897b0df361f59d87ecc6ebb9c1
BLAKE2b-256 1e0c97986d573f7dee61af8395e78481fa9d5932f4ab11ca70346e14de28de25

See more details on using hashes here.

File details

Details for the file fastaccess-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastaccess-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 54f369ec6f462a3dc4f6879dd93608e0031b2da52332bcec60379ab426d4009d
MD5 44492b6d0021c837e3af8461f3478ed6
BLAKE2b-256 62c752998fe91d606c0ae94a1ef68381e5852121e7b7f7943067363a390dcda4

See more details on using hashes here.

File details

Details for the file fastaccess-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastaccess-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8ba96dfd43d6f9faa291c505ad91d4d013b7f473a5b953f31f0ee91799ce8a8e
MD5 f8868e29e1b8a8465217c20f423237a3
BLAKE2b-256 23eaeb823febeb7ad10322ef2638179d3fad7e3d1661a75b9c5d32143a8da4c4

See more details on using hashes here.

File details

Details for the file fastaccess-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastaccess-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 eb8526aa3a032c7a24632dd1fa87d8839a60e1ba5d7a9e73cf878c8e1bd76ef0
MD5 180f47256ad269b9182859fe1f864029
BLAKE2b-256 331b5796be9abc6ca586ff44e217e9863cf0f7e036562003c326cd5162d4785a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page