Transcripts for HGVS libraries
Project description
cdot
cdot is used to load transcripts for the 2 most popular Python HGVS libraries.
It works by:
- Converting RefSeq and Ensembl GTFs into a JSON format
- Providing loaders from that JSON format (or a REST service)
We currently support 788k transcripts, 5.5x as many as Universal Transcript Archive
Examples
Biocommons HGVS example:
from cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider
hp = JSONDataProvider({"GRCh37": "./cdot_220119.grch38.json.gz") # Uses local JSON file
# hp = RESTDataProvider() # Uses API server at cdot.cc
am = AssemblyMapper(hp,
assembly_name='GRCh37',
alt_aln_method='splign', replace_reference=True)
hp = hgvs.parser.Parser()
var_c = hp.parse_hgvs_variant('NM_001637.3:c.1582G>A')
am.c_to_g(var_c)
PyHGVS example:
# TODO
Philosophical differences from Universal Transcript Archive
cdot aims to be as simple as possible: convert existing Ensembl/RefSeq GTFs into JSON format
Universal transcript archive is an excellent and ambitious project that:
- Performs its own mapping of transcript sequences to reference genomes
- Stores the transcript version data (exons etc) in a SQL database
This has some advantages, namely that you can resolve a GRCh37 coordinate for a transcript which was never officially released for that build.
However the complexity causes a few downsides:
- Alignments may not exactly match those in official Ensembl/RefSeq releases
- Local install requires a PostgreSQL installation
- Internet hosted UTA is a PostgreSQL server, so requires client Postgres libraries, is inaccessible behind firewalls. They have been planning on building a REST server since 2014
- High complexity manual process for releases means they do not support Ensembl and take a while to make RefSeq releases.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdot-0.1.1.tar.gz
(6.5 kB
view hashes)
Built Distribution
cdot-0.1.1-py3-none-any.whl
(7.5 kB
view hashes)