Cross-Temporal and Cross-Database Biological Identifier Mapping.
Project description
Cross-Temporal and Cross-Database Biological Identifier Mapping
Modern biology constantly mixes identifiers from different years, databases, and genome builds. The result is a familiar set of problems: IDs disappear, symbols change, references disagree, and “the same gene” isn’t always represented the same way across datasets.
IDTrack is built for that reality. It provides a time-aware, audit-friendly way to translate and harmonize biological identifiers across Ensembl releases and across external namespaces (HGNC, UniProt, RefSeq, Entrez, …), while keeping ambiguity explicit instead of silently forcing a single answer.
What makes IDTrack different
Time-aware mapping: treat Ensembl releases as a “time axis” and travel forward/backward through identifier history.
Assembly-aware mapping: harmonize identifiers across genome builds (e.g. GRCh37 ↔ GRCh38) and respect external databases that are assembly-scoped.
Snapshot boundary for reproducibility: build a release-bounded graph snapshot so results are stable and repeatable.
Explicit external database opt-in: choose which external namespaces participate via a small, editable YAML contract.
Transparency over coercion: conversions are naturally classified as 1→0 (no match), 1→1 (clean), or 1→n (ambiguous).
Scale-ready workflows: caching and snapshot reuse make repeated conversions and multi-dataset harmonization practical.
Who is it for?
Wet-lab researchers who need a reliable, step-by-step path from “my gene list is old” to “my analysis is reproducible”.
Bioinformaticians who want release-pinned, auditable conversions in notebooks, pipelines, and integration workflows.
Atlas builders / integrators who need to harmonize gene identifiers across many cohorts (different Ensembl releases, symbols, and external IDs), keep an explicit audit trail of what mapped/failed/was ambiguous, and ship a release-pinned, reproducible feature space for downstream integration and publication.
Common use cases
Dataset harmonization before integration (single-cell, bulk, atlas-scale collections).
Legacy data rescue (old Ensembl releases, mixed symbols/IDs, retired identifiers).
Publication-grade reproducibility (pin a snapshot boundary + share the exact external configuration).
Cross-database interoperability when collaborators use different identifier conventions.
Documentation and tutorials
The documentation includes a full tutorial suite designed to be the primary learning resource:
Documentation: Documentation
Tutorials: start from the “Tutorials” section in the docs (Part 0 → Part 7).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file idtrack-0.0.5.tar.gz.
File metadata
- Download URL: idtrack-0.0.5.tar.gz
- Upload date:
- Size: 221.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ad6a90422f64bea4cf7454838b3d5746c36720585e8772de9c49c46e4972569e
|
|
| MD5 |
90bb081e8a2d5242cb1144b9e19fe0a3
|
|
| BLAKE2b-256 |
836456e4159abb9bf3705203c4811636e97fecbd32094b91475d1557ac8026d4
|
File details
Details for the file idtrack-0.0.5-py3-none-any.whl.
File metadata
- Download URL: idtrack-0.0.5-py3-none-any.whl
- Upload date:
- Size: 242.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be6c5dc6fca2a98d0be696ba0f17ae401d282ee68805e69a750f5fa08920d354
|
|
| MD5 |
95c39569634eaac306a4dbb0adb53881
|
|
| BLAKE2b-256 |
d89a98c5a7fe5a712aa2ab860352233fd21e1c54356ac732c456c5bab53fd0f8
|