Skip to main content

Library for working with reference genomes and gene GTF/GFFs

Project description

PyReference

PyPi version Python versions

A Python library for working with reference gene annotations. For RefSeq/Ensembl GRCh37/GRCh38 and other species

A GTF/GFF3 can take minutes to load. We pre-process it into JSON, so it can be loaded extremely rapidly.

PyReference makes it easy to write genomics code, which is easily run across different genomes or annotation versions.

Example

import numpy as np
from pyreference import Reference 

reference = Reference()  # uses ~/pyreference.cfg default_build

my_gene_symbols = ["MSN", "GATA2", "ZEB1"]
for gene in reference[my_gene_symbols]:
    average_length = np.mean([t.length for t in gene.transcripts])
    print("%s average length = %.2f" % (gene, average_length))
    print(gene.iv)
    for transcript in gene.transcripts:
        if transcript.is_coding:
            threep_utr = transcript.get_3putr_sequence()
            print("%s end of 3putr: %s" % (transcript.get_id(), threep_utr[-20:]))

Outputs:

MSN (MSN) 1 transcripts average length = 3970.00
chrX:[64887510,64961793)/+
NM_002444 end of 3putr: TAAAATTTAGGAAGACTTCA

GATA2 (GATA2) 3 transcripts average length = 3367.67
chr3:[128198264,128212030)/-
NM_001145662 end of 3putr: AATACTTTTTGTGAATGCCC
NM_001145661 end of 3putr: AATACTTTTTGTGAATGCCC
NM_032638 end of 3putr: AATACTTTTTGTGAATGCCC

ZEB1 (ZEB1) 6 transcripts average length = 6037.83
chr10:[31608100,31818742)/+
NM_001174093 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174094 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_030751 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174096 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001174095 end of 3putr: CTTCTTTTTCTATTGCCTTA
NM_001128128 end of 3putr: CTTCTTTTTCTATTGCCTTA

This takes 4 seconds to load on my machine.

pyreference biotype

Also included is a command line tool (pyreference_biotype.py) which shows which biotypes small RNA fragments map to.

Installation

sudo pip install pyreference

Then you will need to:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyreference-0.7.5.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

pyreference-0.7.5-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file pyreference-0.7.5.tar.gz.

File metadata

  • Download URL: pyreference-0.7.5.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for pyreference-0.7.5.tar.gz
Algorithm Hash digest
SHA256 bd62cc25adc102284808bd524d7958764f12b8f0be222fab776e7f9257e99c6d
MD5 a7fdbe69540b70eda8e372103e3830ae
BLAKE2b-256 9755dc03590e6b34c0fa3c1db137e4df4d5ac9a7573ccc572bf57e6b8c6b1159

See more details on using hashes here.

File details

Details for the file pyreference-0.7.5-py3-none-any.whl.

File metadata

  • Download URL: pyreference-0.7.5-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for pyreference-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 6716bcf6bfdd31be36018faa2cb3c3fd3548d1ce47cc2ce2b41f948a57f40f18
MD5 f6dd06c5608a3ba0c5ad37200f42832f
BLAKE2b-256 74d2ab42ffa5ccccc926b59aa6352a9e79169a9a395c38d87ec24bb686c3aa73

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page