Efficient random access to subsequences in large FASTA files
Project description
fastaccess
Efficient random access to subsequences in FASTA files using byte-level seeking.
Installation
pip install fastaccess
From source (includes C++ backend for better performance):
pip install -e .
The C++ backend requires a C++17 compiler and CMake 3.15+. If unavailable, falls back to pure Python.
Quick Start
from fastaccess import FastaStore
fa = FastaStore("genome.fa") # Builds index, caches for next time
seq = fa.fetch("chr1", 1000, 2000) # 1-based inclusive coordinates
API
FastaStore(path, use_cache=True, cache_dir=None)
path: Path to FASTA file (plain or gzip-compressed.fa.gz)use_cache: Save/load index from.fidxcache filecache_dir: Custom directory for cache file (useful for read-only FASTA directories)
Methods
| Method | Description |
|---|---|
fetch(name, start, stop, reverse_complement=False) |
Fetch subsequence (1-based inclusive) |
fetch_many(queries) |
Batch fetch list of (name, start, stop) tuples |
list_sequences() |
Get all sequence names |
get_length(name) |
Get sequence length |
get_description(name) |
Get FASTA header description |
get_info(name) |
Get dict with name, description, length |
rebuild_index() |
Force rebuild index and update cache |
is_cached() |
Check if loaded from cache |
cache_exists() |
Check if cache file exists |
get_cache_path() |
Get cache file path |
delete_cache() |
Delete cache file |
Errors
KeyError: Sequence name not foundValueError: Invalid coordinates (start < 1, stop < start, stop > length)
Features
- Random access: Uses
seek()to fetch only required bytes - Index caching: 7-40x faster reloading via
.fidxcache files - Gzip support: Reads
.fa.gzfiles directly - 1-based inclusive coordinates: Standard bioinformatics convention
- Format support: Wrapped/unwrapped sequences, Unix/Windows line endings
- Uppercase output: All sequences returned uppercase
Performance
C++ Backend
| Operation | Python | C++ | Speedup |
|---|---|---|---|
| Index build (10MB) | 70 ms | 5 ms | 13x |
| Reverse complement (8 KB) | 0.21 ms | 0.015 ms | 14x |
| Small fetch (100 bp) | 0.017 ms | 0.017 ms | 1x |
| Large fetch (100 KB) | 0.36 ms | 0.35 ms | 1x |
Check if C++ backend is active:
from fastaccess import using_cpp_backend
print(using_cpp_backend()) # True if available
Index Caching
Human genome (3 GB):
First load: ~2 seconds (builds index)
With cache: 0.05 seconds (40x faster)
Cache is automatically invalidated when the FASTA file changes.
Example
from fastaccess import FastaStore
fa = FastaStore("hg38.fa")
# Get sequence info
print(fa.list_sequences()) # ["chr1", "chr2", ...]
print(fa.get_length("chr1")) # 248956422
# Fetch regions
seq = fa.fetch("chr1", 1000, 2000)
rc = fa.fetch("chr1", 1000, 2000, reverse_complement=True)
# Batch fetch
regions = [("chr1", 1, 100), ("chr2", 500, 600)]
sequences = fa.fetch_many(regions)
Requirements
- Python 3.8+
- No runtime dependencies (pure Python fallback always works)
C++ backend (optional):
- C++17 compiler
- CMake 3.15+
Limitations
- ASCII sequences only (DNA/RNA)
- Gzip files require full decompression (no random access within compressed data)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fastaccess-0.2.1.tar.gz.
File metadata
- Download URL: fastaccess-0.2.1.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55ccf7e23a24693c8c38900c2f356eddbca860a37230d882eee27bda8bfc2b65
|
|
| MD5 |
4638e66c1b01bd21db5d3559e173f8de
|
|
| BLAKE2b-256 |
05e8b6c4ae10f962848541748dd0c8ac4c3df5e6bf3f366b35486acf9263cd0c
|
File details
Details for the file fastaccess-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: fastaccess-0.2.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 139.9 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44c094eb71a812b490ca6865b0339d0c9546af83cca1871b2c8c52678c3ad22c
|
|
| MD5 |
a11f59164b09eb643676ca84e2288750
|
|
| BLAKE2b-256 |
2ec715b3c3763a0b2e42b37ede39786021cd81c2f970faece2fa843e50e16d20
|
File details
Details for the file fastaccess-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: fastaccess-0.2.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 140.5 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f32837792472668da196a530e455dad5a1804253d760fbc78e089413dc8d38d8
|
|
| MD5 |
612389897b0df361f59d87ecc6ebb9c1
|
|
| BLAKE2b-256 |
1e0c97986d573f7dee61af8395e78481fa9d5932f4ab11ca70346e14de28de25
|
File details
Details for the file fastaccess-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: fastaccess-0.2.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 139.1 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54f369ec6f462a3dc4f6879dd93608e0031b2da52332bcec60379ab426d4009d
|
|
| MD5 |
44492b6d0021c837e3af8461f3478ed6
|
|
| BLAKE2b-256 |
62c752998fe91d606c0ae94a1ef68381e5852121e7b7f7943067363a390dcda4
|
File details
Details for the file fastaccess-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: fastaccess-0.2.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 139.3 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ba96dfd43d6f9faa291c505ad91d4d013b7f473a5b953f31f0ee91799ce8a8e
|
|
| MD5 |
f8868e29e1b8a8465217c20f423237a3
|
|
| BLAKE2b-256 |
23eaeb823febeb7ad10322ef2638179d3fad7e3d1661a75b9c5d32143a8da4c4
|
File details
Details for the file fastaccess-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: fastaccess-0.2.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 139.0 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb8526aa3a032c7a24632dd1fa87d8839a60e1ba5d7a9e73cf878c8e1bd76ef0
|
|
| MD5 |
180f47256ad269b9182859fe1f864029
|
|
| BLAKE2b-256 |
331b5796be9abc6ca586ff44e217e9863cf0f7e036562003c326cd5162d4785a
|