Library for working with reference genomes and gene GTF/GFFs
Project description
PyReference
A Python library for working with reference gene annotations. For RefSeq/Ensembl GRCh37/GRCh38 and other species
A GTF/GFF3 can take minutes to load. We pre-process it into JSON, so it can be loaded extremely rapidly.
PyReference makes it easy to write genomics code, which is easily run across different genomes or annotation versions.
Example
import numpy as np
from pyreference import Reference
reference = Reference() # uses ~/pyreference.cfg default_build
my_gene_symbols = ["MSN", "GATA2", "ZEB1"]
for gene in reference[my_gene_symbols]:
average_length = np.mean([t.length for t in gene.transcripts])
print("%s average length = %.2f" % (gene, average_length))
print(gene.iv)
for transcript in gene.transcripts:
if transcript.is_coding:
threep_utr = transcript.get_3putr_sequence()
print("%s end of 3putr: %s" % (transcript.get_id(), threep_utr[-20:]))
Outputs:
MSN (MSN) 1 transcripts average length = 3970.00
chrX:[64887510,64961793)/+
NM_002444 end of 3putr: TAAAATTTAGGAAGACTTCA
GATA2 (GATA2) 3 transcripts average length = 3367.67
chr3:[128198264,128212030)/-
NM_001145662 end of 3putr: AATACTTTTTGTGAATGCCC
NM_001145661 end of 3putr: AATACTTTTTGTGAATGCCC
NM_032638 end of 3putr: AATACTTTTTGTGAATGCCC
ZEB1 (ZEB1) 6 transcripts average length = 6037.83
chr10:[31608100,31818742)/+
NM_001174093 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174094 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_030751 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174096 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174095 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001128128 end of 3putr: CTTCTTTTTCTATTGCCTTA
This takes 4 seconds to load on my machine.
pyreference biotype
Also included is a command line tool (pyreference_biotype.py) which shows which biotypes small RNA fragments map to.
Installation
sudo pip install pyreference
Then you will need to:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyreference-0.7.5.tar.gz
.
File metadata
- Download URL: pyreference-0.7.5.tar.gz
- Upload date:
- Size: 23.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd62cc25adc102284808bd524d7958764f12b8f0be222fab776e7f9257e99c6d |
|
MD5 | a7fdbe69540b70eda8e372103e3830ae |
|
BLAKE2b-256 | 9755dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159 |
File details
Details for the file pyreference-0.7.5-py3-none-any.whl
.
File metadata
- Download URL: pyreference-0.7.5-py3-none-any.whl
- Upload date:
- Size: 23.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6716bcf6bfdd31be36018faa2cb3c3fd3548d1ce47cc2ce2b41f948a57f40f18 |
|
MD5 | f6dd06c5608a3ba0c5ad37200f42832f |
|
BLAKE2b-256 | 74d2ab42ffa5ccccc926b59aa6352a9e79169a9a395c38d87ec24bb686c3aa73 |