Skip to main content

An interval tree data structure

Project description

Build Status License: MIT Coverage Status

itree

itree is an interval tree data structure based on a self-balancing AVL binary search tree. Suitable for use with sequence features in bioinformatics.

Why itree?

  • itree is fast

itree implements an augmented search tree optmized for searching sets of intervals. The following benchmarks the performance of inserting, removing and searching for random intervals taken from the human chromosome 12 Gencode genes[1]:

benchmarking
  • itree is convenient

itree has a second-level interface for groups of objects (e.g. chromosomes):

>>> import itree, collections
>>> bed_records = [tuple(l.split()[:3]) for l in open('gencode.bed')] 
>>> i = collections.namedtuple('MyInterval', ['chrom','start','end'])
>>> t = itree.GroupedITree('chrom', [i(f[0], int(f[1]), int(f[2])) for f in bed_records])
>>> t.search(i('chr15', 45167200, 45167300)) 
[MyInterval(chrom='chr15', start=45167213, end=45187956),
 MyInterval(chrom='chr15', start=45167250, end=45187952),
 MyInterval(chrom='chr15', start=45167213, end=45201175),
 MyInterval(chrom='chr15', start=45152663, end=45167526)]

Getting Started

  • Construction

Creating an interval tree object:

>>> import itree
>>> t = itree.ITree()
  • Insertion

Any item inserted into an interval tree must contain "start" and "end" attributes as integers.

>>> import collections
>>> i = collections.namedtuple('MyInterval', ['start','end'])
>>> t.insert(i(1,15))
>>> t.insert(i(3,20))
>>> t.insert(i(4,20))
>>> t.insert(i(5,15))
>>> t.insert(i(6,7))
  • Search

Search for all intervals overlapping a given interval

>>> t.search(i(1,4)) 
[MyInterval(start=3, end=20), MyInterval(start=4, end=20), MyInterval(start=1, end=15)]
  • Removal

Remove an interval exactly matching the given interval by its start and end attributes (but not necessarily the same object).

>>> t.pstring()
      ┌–(1,15)
(3,20)
            ┌–(4,20)
      └–(5,15)
            └–(6,7)

>>> t.remove(i(1,15))
>>> t.pstring()
      ┌–(3,20)
            └–(4,20)
(5,15)
      └–(6,7)

The pstring method is mostly for debugging, but here we illustrate the rebalancing of the tree.

  • Grouping

A second-level itree object, GroupedITree, works as a proxy to itree objects which can be grouped by any hashable attribute or function:

>>> import itree, collections
>>> i = collections.namedtuple('Appointment', ['day','start','end'])
>>> appts = [i('Monday', 9, 13), i('Monday', 16, 17), i('Tuesday', 14, 15)]
>>> t = itree.GroupedITree(key='day', intervals=appts)
>>> t.search(i('Monday', 11, 12))
[Appointment(day='Monday', start=9, end=13)]
>>> t.search(i('Monday', 14, 15))
[]

You may also use any arbitrary hashable value returned from a function as a key:

>>> i = collections.namedtuple('Appointment', ['day','month','start','end'])
>>> date_key = lambda appt: "{} {}".format(appt.day, appt.month)
>>> appts = [i(5, 'Jan', 9, 13), i(6, 'Jan', 16, 17), i(5, 'Feb', 14, 15)]
>>> t = itree.GroupedITree(key=date_key, intervals=appts)
>>> t.search(i(5, 'Jan', 16, 17))   
[]

Notes

[1]. generated with python3 scripts/benchmarking.py scripts/gencode.chr12.bed 500 10000 500 > benchmarking.txt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

itree-0.0.5.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

itree-0.0.5-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file itree-0.0.5.tar.gz.

File metadata

  • Download URL: itree-0.0.5.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for itree-0.0.5.tar.gz
Algorithm Hash digest
SHA256 78a3ece4eb5aad76326914419c7922b96fface8ad3a1ae44d7a217ce73095679
MD5 f334663aa253daddc12e18237976e3cd
BLAKE2b-256 44b6149dae09305a66eeb13e7bea0fb9bdff7a318bb9567d6215bcf4909ffbf6

See more details on using hashes here.

File details

Details for the file itree-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: itree-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for itree-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 713db41a2e5270f8f5964cbab406d89576b542b363b834e1fe4a40e8e3b37a80
MD5 b4919df9fc585db8e02327d279364e84
BLAKE2b-256 61de3999aa179241a30bb2cc4c63ac49f877cfbcbd618946409db2b7ede427b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page