Skip to main content

Interval tree convenience classes for genomic data

Project description

Convenience classes for loading UCSC genomic annotation records into a set of interval tree data structures.

Installation

The easiest way to install most Python packages is via easy_install or pip:

$ pip install intervaltree-bio

The package requires the intervaltree package (which is normally installed automatically when using pip or easy_install).

Usage

One of the major uses for Interval tree data structures is in bioinformatics, where the intervals correspond to genes or other features of the genome.

As genomes typically consist of a set of chromosomes, a separate interval tree for each chromosome has to be maintained. Thus, rather than using an single interval tree, you would typically use something like defaultdict(IntervalTree) to index data of genomic features. The module intervaltree_bio offers a GenomeIntervalTree data structure, which is a similar convenience data structure. In addition to specific methods for working with genomic intervals it also provides facilities for reading BED files and the refGene table from UCSC.

The core example is loading the transcription regions of the knownGene table from the UCSC website:

>> from intervaltree_bio import GenomeIntervalTree
>> knownGene = GenomeIntervalTree.from_table()
>> len(knownGene)

It is then possible to use the data structure to search known genes within given intervals:

>> result = knownGene[b'chr1'].search(100000, 138529)

It is possible to load other UCSC tables besides knownGene or specify custom URL or file to read the table from. Consult the docstring of the GenomeIntervalTree.from_table method for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intervaltree_bio-1.0.0.zip (12.1 kB view details)

Uploaded Source

File details

Details for the file intervaltree_bio-1.0.0.zip.

File metadata

  • Download URL: intervaltree_bio-1.0.0.zip
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for intervaltree_bio-1.0.0.zip
Algorithm Hash digest
SHA256 e3458163528224c5680807e38388fff84656a4a1aa9e5d406d2ed0c59f1d329e
MD5 a794df30f4ea784ee39baaf24c14a506
BLAKE2b-256 091eca6f2d5484be7e7814bac24fb6bdd2a8376d3683884c7a3dce35d2d802b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page