A fast, pure-Python package for reading .2bit files (used by the UCSC genome browser)
Project description
twobitreader
twobitreader is a small, fast Python package for reading UCSC .2bit genome
files. It supports random access by sequence name and genomic interval, making
it useful for pulling slices from large genome files without loading whole
chromosomes into memory.
The package reads .2bit files only; it does not write them.
Performance in v4
Version 4 keeps decoding pure Python while reducing startup cost and speeding up
common slice paths. The main changes are lazy construction of the large
two-byte lookup table, faster N-block lookup with bisect, and decoded sequence
buffers backed by plain Python character lists instead of deprecated
array('u') buffers.
Benchmarks below compare v4.0.0 with v3.1.8 on Python 3.14.5, using
synthetic 5 Mb .2bit files. The v3.1.9 tag has the same reader
implementation as v3.1.8, plus release/CI packaging changes.
| Benchmark | v3.1.8 | v4.0.0 | Change |
|---|---|---|---|
| Cold import time | 179.6 ms | 35.6 ms | 5.0x faster |
| Peak import memory | 14.18 MB | 2.22 MB | 6.4x less |
| Plain 1 Mb slice | 135.6 ms | 17.3 ms | 7.8x faster |
| 10 bp slice with 50k N-blocks | 0.749 ms | 0.0026 ms | 290x faster |
Installation
Install the latest released package from PyPI:
pip install twobitreader
For local development, clone the repository and install it in editable mode:
git clone https://github.com/benjschiller/twobitreader.git
cd twobitreader
pip install -e ".[dev,docs]"
pre-commit install
Python Usage
Open a .2bit file with TwoBitFile. It behaves like a dictionary whose keys
are sequence names and whose values are sliceable sequence objects.
from twobitreader import TwoBitFile
with TwoBitFile("hg19.2bit") as genome:
print(genome.keys())
print(genome.sequence_sizes()["chr1"])
sequence = genome["chr1"][100_000:100_050]
print(sequence)
Coordinates follow Python and UCSC BED conventions: they are 0-based and
end-open. For example, genome["chr1"][10:20] returns 10 bases.
Converting an entire chromosome to a string works, but can use a lot of memory:
with TwoBitFile("hg19.2bit") as genome:
chr_m = str(genome["chrM"])
Command-Line Usage
twobitreader can also read BED-style intervals from standard input and write
FASTA records to standard output:
python -m twobitreader genome.2bit < regions.bed > regions.fa
Input lines should have at least three whitespace-separated fields:
chrom start end
chr1 100000 100050
chr2 250 300
Invalid regions are skipped with warnings written to standard error. Intervals that extend past the end of a sequence are truncated.
Downloading Genomes
The twobitreader.download module can fetch .2bit genomes from UCSC:
python -m twobitreader.download hg19
Please follow UCSC's usage guidelines and avoid excessive automated downloads.
Development
Run the full test suite with:
python3 -m unittest discover -s tests
Run the lightweight package smoke test with:
python3 test_package.py
Build the package with:
python3 -m build
Build the Sphinx documentation with:
sphinx-build -W --keep-going -b html doc doc/_build/html
Run formatting and repository checks with:
pre-commit run --all-files
The Makefile uses python in a few targets. If your environment only provides
python3, run the equivalent command directly with python3.
License
twobitreader is licensed under the Perl Artistic License 2.0. See
LICENSE.txt and COPYRIGHT for details.
No warranty is provided, express or implied.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file twobitreader-4.0.0.tar.gz.
File metadata
- Download URL: twobitreader-4.0.0.tar.gz
- Upload date:
- Size: 30.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce8ad82f64745a82e63e668f94fbdccbc16f233255b98991b3f74102666c22c7
|
|
| MD5 |
7c709078ef3725c7377e85936eff6eeb
|
|
| BLAKE2b-256 |
095f6f2743eb321e647a348d035510900356de1138c2568a01ceeac02f901c1d
|
File details
Details for the file twobitreader-4.0.0-py3-none-any.whl.
File metadata
- Download URL: twobitreader-4.0.0-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e1252414a52fa9fcb50a3e54a3980547d558d0c953e0e7bd7e6d90369008bd3
|
|
| MD5 |
2fcbb29b474de5867bdd5ea4af766b23
|
|
| BLAKE2b-256 |
3c45542256a04e4015d08822787fe232322e0784b8614ab83c6c17097697ebc0
|