Skip to main content

Python package for Augmented Interval List

Project description

Augmented Interval List

Build Status PyPI version Coffee

Augmented interval list (AIList) is a data structure for enumerating intersections between a query interval and an interval set. AILists have previously been shown to be faster than interval tree, NCList, and BEDTools.

This implementation is a Python wrapper of the one used in the original AIList library.

Additonal wrapper functions have been created which allow easy user interface.

All citations should reference to original paper.

For full usage and installation documentation

Install

If you dont already have numpy and scipy installed, it is best to download Anaconda, a python distribution that has them included.

    https://continuum.io/downloads

Dependencies can be installed by:

    pip install -r requirements.txt

PyPI install, presuming you have all its requirements installed:

    pip install ailist

Benchmark

Test numpy random integers:

# ailist version: 0.1.7
from ailist import AIList
# ncls version: 0.0.53
from ncls import NCLS
# numpy version: 1.18.4
import numpy as np
# pandas version: 1.0.3
import pandas as pd
# quicksect version: 0.2.2
import quicksect

# Set seed
np.random.seed(100)


# First values
starts1 = np.random.randint(0, 100000, 100000)
ends1 = starts1 + np.random.randint(1, 10000, 100000)
ids1 = np.arange(len(starts1))
values1 = np.ones(len(starts1))

# Second values
starts2 = np.random.randint(0, 100000, 100000)
ends2 = starts2 + np.random.randint(1, 10000, 100000)
ids2 = np.arange(len(starts2))
values2 = np.ones(len(starts2))
Library Function Time (µs)
ncls single overlap 1170
pandas single overlap 924
quicksect single overlap 550
ailist single overlap 73
Library Function Time (s) Max Memory (GB)
ncls bulk overlap 151 s >50
ailist bulk overlap 17.8 s ~9

Usage

from ailist import AIList
import numpy as np

i = AIList()
i.add(15, 20)
i.add(10, 30)
i.add(17, 19)
i.add(5, 20)
i.add(12, 15)
i.add(30, 40)

# Print intervals
i.display()
# (15-20) (10-30) (17-19) (5-20) (12-15) (30-40)

# Find overlapping intervals
o = i.intersect(6, 15)
o.display()
# (5-20) (10-30) (12-15)

# Find index of overlaps
i.intersect_index(6, 15)
# array([3, 1, 4])

# Now i has been constructed/sorted
i.display()
# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40)

# Can be done manually as well at any time
i.construct()

# Iterate over intervals
for x in i:
   print(x)
# Interval(5-20, 3)
# Interval(10-30, 1)
# Interval(12-15, 4)
# Interval(15-20, 0)
# Interval(17-19, 2)
# Interval(30-40, 5)

# Interval comparisons
j = AIList()
j.add(5, 15)
j.add(50, 60)

# Subtract regions
s = i - j #also: i.subtract(j)
s.display()
# (15-20) (15-30) (15-20) (17-19) (30-40) 

# Common regions
i + j #also: i.common(j)
# AIList
#  range: (5-15)
#    (5-15, 3)
#    (10-15, 1)
#    (12-15, 4)

# AIList can also add to from arrays
starts = np.arange(10,1000,100)
ends = starts + 50
ids = starts
values = np.ones(10)
i.from_array(starts, ends, ids, values)
i.display()
# (5-20) (10-30) (12-15) (15-20) (17-19) (30-40) 
# (10-60) (110-160) (210-260) (310-360) (410-460) 
# (510-560) (610-660) (710-760) (810-860) (910-960)

# Merge overlapping intervals
m = i.merge(gap=10)
m.display()
# (5-60) (110-160) (210-260) (310-360) (410-460) 
# (510-560) (610-660) (710-760) (810-860) (910-960)

# Find array of coverage
c = i.coverage()
c.head()
# 5    1.0
# 6    1.0
# 7    1.0
# 8    1.0
# 9    1.0
# dtype: float64

# Calculate window protection score
w = i.wps(5)
w.head()
# 5   -1.0
# 6   -1.0
# 7    1.0
# 8   -1.0
# 9   -1.0
# dtype: float64

# Filter to interval lengths between 3 and 20
fi = i.filter(3,20)
fi.display()
# (5-20) (10-30) (15-20) (30-40)

# Query by array
i.intersect_from_array(starts, ends, ids)
# (array([ 10,  10,  10,  10,  10,  10,  10, 110, 210, 310, 410, 510, 610,
#         710, 810, 910]),
# array([  5,   2,   0,   4,  10,   1,   3, 110, 210, 310, 410, 510, 610,
#        710, 810, 910]))

Original paper

Jianglin Feng, Aakrosh Ratan, Nathan C Sheffield; Augmented Interval List: a novel data structure for efficient genomic interval search, Bioinformatics, btz407, https://doi.org/10.1093/bioinformatics/btz407

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ailist-2.1.7.tar.gz (546.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ailist-2.1.7-cp312-cp312-macosx_15_0_x86_64.whl (567.6 kB view details)

Uploaded CPython 3.12macOS 15.0+ x86-64

File details

Details for the file ailist-2.1.7.tar.gz.

File metadata

  • Download URL: ailist-2.1.7.tar.gz
  • Upload date:
  • Size: 546.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.3 Darwin/24.4.0

File hashes

Hashes for ailist-2.1.7.tar.gz
Algorithm Hash digest
SHA256 31ece019a067c545515666f9e177b1a196208efbd15c2d3d86d46088b78ead29
MD5 7e06ad7e2c4672099f477bc4586f5c89
BLAKE2b-256 c8998bd8538b14fe08d8f180a061499f7903ae7aed7a7c04158d3cee9390be34

See more details on using hashes here.

File details

Details for the file ailist-2.1.7-cp312-cp312-macosx_15_0_x86_64.whl.

File metadata

  • Download URL: ailist-2.1.7-cp312-cp312-macosx_15_0_x86_64.whl
  • Upload date:
  • Size: 567.6 kB
  • Tags: CPython 3.12, macOS 15.0+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.3 Darwin/24.4.0

File hashes

Hashes for ailist-2.1.7-cp312-cp312-macosx_15_0_x86_64.whl
Algorithm Hash digest
SHA256 77477c0333b797e0dd366accb111a94bb271ebc51806d55b52a35815b80c501d
MD5 4e43f751864e8a3b669a91954c9665c7
BLAKE2b-256 f8c54e89cef6477dbed6febf562d135c2f7daf7fbbaddf675f67a05396836553

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page