Skip to main content

Algorithms and data structures for my Python projects

Project description

ddalg

Algorithms and data structures for my Python projects.

Build Status

Interval tree

Interval tree is a data structure for holding intervals and to allow efficiently find intervals that overlap with given interval or point. Read more on Wikipedia.

Implementation note

This implementation uses half-open intervals, where begin coordinate is excluded. Half-open intervals are used in e.g. BED genomic format.

The current implementation needs to rebuild the tree after each insert, hence the tree is not efficient for using in read/write fashion.

Usage

  • implement your custom interval object while extending Interval. Two properties need to be overwritten:
    • begin - 0-based (excluded) begin coordinate of the interval
    • end - 0-based (included) end coordinate of the interval
      from ddalg.model import Interval
      
      class YourInterval(Interval):
      
        def __init__(self, begin: int, end: int):
          self._begin = begin
          self._end = end
          # anything else
      
        @property
        def begin(self):
          return self._begin
      
        @property
        def end(self):
          return self._end
      
  • create a collection of your intervals and store them in the interval tree:
    from ddalg.itree import IntervalTree
     
    itree = IntervalTree([YourInterval(0, 3), YourInterval(1, 4)])
    # ... 
    
  • query itree:
    • using 1-based position:
      itree.search(1)
      

      returns (0,3)

    • using half-open interval coordinates:
      itree.get_overlaps(0, 1) 
      

      returns (0,3), effectively the same query as above

    • for intervals with minimal required coverage
      itree.fuzzy_query(0, 1, coverage=.90)
      

      return intervals with >=.9 overlap with respect to query coordinates

    • for intervals with minimal jaccard index
      itree.jaccard_query(0, 1, min_jaccard=.90)
      

      return intervals having jaccard_index>=.9 with respect to query coordinates

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for ddalg, version 0.0.3.post0
Filename, size File type Python version Upload date Hashes
Filename, size ddalg-0.0.3.post0-py3.6.egg (27.1 kB) File type Egg Python version 3.6 Upload date Hashes View
Filename, size ddalg-0.0.3.post0.tar.gz (8.6 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page