Universal Variant ID - compact 128-bit identifiers for human genetic variation
Project description
uvid
Compact 128-bit Universal Variant IDs for human genetic variation.
Encode any human genomic variant -- SNP, indel, MNV -- into a deterministic 128-bit identifier that sorts in natural genomic order. No central authority, no database round-trips, no coordination required.
from uvid import UVID
uvid = UVID.encode("chr1", 100, "A", "G", "GRCh38")
print(uvid) # 00000064-40000001-00000000-00000006
Read the docs for full API reference, normalization guide, and bit-layout details.
Why UVID?
Genomic variant databases typically assign arbitrary integer or string IDs to variants, requiring a round-trip to the database to discover whether a variant already exists before inserting it. UVID eliminates that problem: the ID is computed deterministically from the variant itself, so identical variants always receive identical IDs regardless of where or when they are encoded.
| Property | Detail |
|---|---|
| Deterministic | Same variant = same ID, anywhere, without coordination |
| Compact | 16 bytes per variant (fits in a UUID column) |
| Sortable | Natural genomic order when compared as unsigned 128-bit integers |
| Streaming-friendly | ID is known before database interaction -- bulk upsert in a single pass |
| Shard-friendly | ID-driven partitioning for distributed variant stores |
| UUIDv5 compatible | Deterministic SHA-1 mapping for systems that expect standard UUIDs |
| Sequence-searchable | Alleles up to 20 bp are stored exactly; longer alleles keep length + a 17-bit Rabin fingerprint |
| HGVS support | Bidirectional conversion between HGVS genomic notation (g./m.) and UVIDs |
128-bit layout (MSB to LSB)
127 96 95 94 93 47 46 0
+----------+--+--+---------++---------+
| position |as|rm| REF |am| ALT |
| (32) |(2)(1)| (46) |(1)| (46) |
+----------+--+--+---------++---------+
| Bits | Width | Field |
|---|---|---|
| 127-96 | 32 | Linearized genome position |
| 95-94 | 2 | Assembly (0 = GRCh37, 1 = GRCh38) |
| 93 | 1 | REF mode (0 = string, 1 = length) |
| 92-47 | 46 | REF payload |
| 46 | 1 | ALT mode (0 = string, 1 = length) |
| 45-0 | 46 | ALT payload |
Each allele payload is independently encoded in one of two modes:
- String mode (mode=0): 5-bit length + 40-bit 2-bit-encoded DNA. Stores up to 20 bases exactly.
- Length mode (mode=1): 28-bit length + 17-bit Rabin fingerprint. Used for sequences >20 bases or containing non-ACGT characters.
Limitations
- Two assemblies supported: GRCh37/hg19 and GRCh38/hg38 (2 reserved slots remain).
- Alleles longer than 20 bases cannot be decoded exactly; the original sequence is always recoverable from the reference or VCF.
- Focused on human genomics.
Installation
Requires Python 3.10+ and a Rust toolchain (for building from source).
# From PyPI (once published)
uv pip install uvid
# From source
uv pip install .
# As a CLI tool
uv tool install uvid
Quick Start
CLI
# Encode a variant
uvid encode chr1 100 A G
# Decode a UVID
uvid decode 00000064-40000001-00000000-00000006
# HGVS encode -- convert HGVS notation to UVID
uvid hgvs-encode "NC_000001.11:g.12345A>G"
# HGVS decode -- convert UVID back to HGVS notation
uvid hgvs-decode 00003039-40000001-00000000-00000006
# Annotate a VCF with UVIDs in the ID column
uvid vcf input.vcf output.vcf -a GRCh38
# Add a VCF to a .uvid collection
uvid add collection.uvid sample.vcf
# Search by region
uvid search collection.uvid --sample sample__NA12878 --chr chr1 --start 10000 --end 20000
# Collection info
uvid info collection.uvid
Python
from uvid import UVID, Collection, hgvs_to_uvid, uvid_to_hgvs, vcf_passthrough
# Encode / decode
uvid = UVID.encode("chr1", 100, "A", "G", "GRCh38")
fields = uvid.decode()
# {'chr': '1', 'pos': 100, 'ref': 'A', 'alt': 'G', 'assembly': 'GRCh38', ...}
# HGVS conversion
uvid = hgvs_to_uvid("NC_000001.11:g.12345A>G")
hgvs_str, warnings = uvid_to_hgvs(uvid.to_hex())
# hgvs_str = "NC_000001.11:g.12345A>G"
# UUIDv5 conversion
print(uvid.uuid5()) # deterministic UUID
# Range queries
lower, upper = UVID.range("chr1", 10000, 20000, "GRCh38")
# .uvid collections (DuckDB-backed)
store = Collection("my_variants.uvid")
store.add_vcf("sample.vcf", "GRCh38")
results = store.search_region("sample__NA12878", "chr1", 10000, 20000)
# VCF passthrough -- stamp UVIDs into the ID column
count = vcf_passthrough("input.vcf", "output.vcf", assembly="GRCh38")
Variant Normalization
uvid includes a built-in normalizer based on Tan et al. 2015 (the same algorithm used by bcftools and vt) to ensure consistent IDs across differently-represented variants:
# Normalize and encode in one pass
uvid vcf input.vcf output.vcf -a GRCh38 --normalize
See the normalization guide for details on reference genome setup.
Architecture
Python (typer CLI / library API)
|
PyO3 FFI
|
+-----+--------+-------+--------+-----------+------+
| | | | | | |
| uvid128 assembly vcf normalize store hgvs |
| (encode/ (chr (noodles (Tan et (DuckDB (HGVS|
| decode) offsets) parser) al 2015) I/O) g./m.)|
+------------------------------------------------+
Rust core
- Rust core: UVID encoding/decoding, VCF parsing via noodles, variant normalization, DuckDB bulk I/O, HGVS notation support
- Python bindings: PyO3 + maturin
- CLI: Typer wrapping the native library
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uvid-0.5.4.tar.gz.
File metadata
- Download URL: uvid-0.5.4.tar.gz
- Upload date:
- Size: 209.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de1bf88d3ca95a2d060a2c4956dc706d2e966044d30e052daa976c45f4e28973
|
|
| MD5 |
6bdad29c9bf75e6e82131ce1ddc09a06
|
|
| BLAKE2b-256 |
cfc35f4c2f16682f08080d219384c15057fa14f0a966bc651214b1b361f6a91f
|
File details
Details for the file uvid-0.5.4-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: uvid-0.5.4-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 10.2 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3ab19943eb601a035d213eddacd8ec9d6fb32a579ac3a8380a5cdc0efcbac5d
|
|
| MD5 |
7f686b5b5d985aed5ecfd3a0ea698397
|
|
| BLAKE2b-256 |
a7ed7810c870184f4d61a5b7432ae53679dad65772fdfb297f8b90b62705c2c4
|
File details
Details for the file uvid-0.5.4-cp311-cp311-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: uvid-0.5.4-cp311-cp311-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 15.7 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a499e6fa2a7c3212d7767c6f11a85814668d69864e369323ebac0ae8aa0f0aa
|
|
| MD5 |
64d7447ab868d824eee4b6d5eaa16581
|
|
| BLAKE2b-256 |
ce17112d1e9eff5589f2304de5814f9adba77cd063934ceb9574a35800a717e1
|
File details
Details for the file uvid-0.5.4-cp311-cp311-manylinux_2_28_aarch64.whl.
File metadata
- Download URL: uvid-0.5.4-cp311-cp311-manylinux_2_28_aarch64.whl
- Upload date:
- Size: 14.5 MB
- Tags: CPython 3.11, manylinux: glibc 2.28+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0cae2909ac95950e097b84b2f0240b1315a392fbec0d760c95b814fd251c238a
|
|
| MD5 |
d7d7ec1c0f014055b933b80b23b68c8c
|
|
| BLAKE2b-256 |
01e5eff0215e3529c6a294d4950187b5c567a0fbb1ea69c1c1c80cd0f04a9ffe
|
File details
Details for the file uvid-0.5.4-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: uvid-0.5.4-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 12.1 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
efd0f22a41df7c34432bb45276590f95a604a22d96ad0e37041dd8744f46783d
|
|
| MD5 |
a502bee15ef9bc46cb6ef26fae31a6f7
|
|
| BLAKE2b-256 |
7ebc9eb6ea455a592ee3d19653d90537b18aacd424a622cb3bc509b44ca3cd54
|
File details
Details for the file uvid-0.5.4-cp311-cp311-macosx_10_12_x86_64.whl.
File metadata
- Download URL: uvid-0.5.4-cp311-cp311-macosx_10_12_x86_64.whl
- Upload date:
- Size: 13.3 MB
- Tags: CPython 3.11, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d30ae867607723eb7e308bcd57494e945a625b184a39c33aca1b1b031363deb5
|
|
| MD5 |
7fe16158e48a8963deea4a3d6e51ed91
|
|
| BLAKE2b-256 |
80362bd935eab8b8a9b157c29b6152aef40dbd474ba72f872b234b30ad4d650b
|