gtf_to_genes

Fast GTF parser

These details have not been verified by PyPI

Project links

Homepage

Project description

***************************************
Overview
***************************************
We want an extremely fast, lightweight way to access gene data stored in GTF format.

The parsed data is held in an intuitive
Gene
-> transcript
-> transcript
with exons being stored as intervals

Our aim is to
* cache data in binary format, which can be
* re-read in < 10s for even the largest genomes

Currently initial parsing Ensembl Homo sapiens release 56 takes around 4.5 minutes.
The binary data can be reloaded in < 10s.
This contains *all* of the data structure in the original GTF file

Note that we sacrifice memory usage for speed. This is seldom a problem for modern computers
and genome sizes (There are around ~400,000 exons but there are stored as intervals / int pairs)

***************************************
A Simple example
***************************************
::
gene_structures = t_parse_gtf("Mus musculus")

#
# used cached data for speed
#
ignore_cache = False

#
# get all protein coding genes only
#
genes_by_type = gene_structures.get_genes(gtf_file, logger, ["protein_coding"], ignore_cache = ignore_cache)

#
# print out gene counts
#
t_parse_gtf.log_gene_types (logger, genes_by_type)

return genes_by_type

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.40

Dec 4, 2014

1.31

Dec 1, 2014

1.30

Dec 1, 2014

1.09

Jul 12, 2012

1.08

Dec 2, 2011

1.07

Jul 1, 2010

1.06

Jun 23, 2010

1.04

Jun 15, 2010

1.03

May 28, 2010

1.02

May 28, 2010

1.01

May 28, 2010

1.0

May 28, 2010

This version

1.0beta2 pre-release

Apr 13, 2010

1.0beta pre-release

Mar 12, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtf_to_genes-1.0beta2.tar.gz (18.5 kB view details)

Uploaded Apr 13, 2010 Source

File details

Details for the file gtf_to_genes-1.0beta2.tar.gz.

File metadata

Download URL: gtf_to_genes-1.0beta2.tar.gz
Upload date: Apr 13, 2010
Size: 18.5 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for gtf_to_genes-1.0beta2.tar.gz
Algorithm	Hash digest
SHA256	`ad86d85d3c555e8605a32ca6d9ad34c537e5c2b7c79094b895bc28b45b30718f`
MD5	`6de7b3da40b7147fc92ceb6b048cde9c`
BLAKE2b-256	`d89bdbb884d7ddba138899608665954b0da8a749127f9d800fb1b67c606acf67`

See more details on using hashes here.

gtf_to_genes 1.0beta2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes