Skip to main content

A fast interval tree-like implementation in C, wrapped for the Python ecosystem.

Project description

Nested containment list

Build Status PyPI version

The Nested Containment List is a datastructure for interval overlap queries, like the interval tree. It is usually an order of magnitude faster than the interval tree both for building and query lookups.

The implementation here is a revived version of the one used in the now defunct PyGr library, which died of bitrot. I have made it less memory-consuming and created wrapper functions which allows batch-querying the NCLS for further speed gains.

It was implemented to be the cornerstone of the PyRanges project, but I have made it available to the Python community as a stand-alone library. Enjoy.

Original Paper: https://academic.oup.com/bioinformatics/article/23/11/1386/199545 Cite: http://dx.doi.org/10.1093/bioinformatics/btz615

Cite

If you use this library in published research cite

http://dx.doi.org/10.1093/bioinformatics/btz615

Install

pip install ncls

Usage

from ncls import NCLS

import pandas as pd

starts = pd.Series(range(0, 5))
ends = starts + 100
ids = starts

subject_df = pd.DataFrame({"Start": starts, "End": ends}, index=ids)

print(subject_df)
#    Start  End
# 0      0  100
# 1      1  101
# 2      2  102
# 3      3  103
# 4      4  104

ncls = NCLS(starts.values, ends.values, ids.values)

# python API, slower
it = ncls.find_overlap(0, 2)
for i in it:
    print(i)
# (0, 100, 0)
# (1, 101, 1)

starts_query = pd.Series([1, 3])
ends_query = pd.Series([52, 14])
indexes_query = pd.Series([10000, 100])

query_df = pd.DataFrame({"Start": starts_query.values, "End": ends_query.values}, index=indexes_query.values)

query_df
#        Start  End
# 10000      1   52
# 100        3   14


# everything done in C/Cython; faster
l_idxs, r_idxs = ncls.all_overlaps_both(starts_query.values, ends_query.values, indexes_query.values)
l_idxs, r_idxs
# (array([10000, 10000, 10000, 10000, 10000,   100,   100,   100,   100,
#          100]), array([0, 1, 2, 3, 4, 0, 1, 2, 3, 4]))

print(query_df.loc[l_idxs])
#        Start  End
# 10000      1   52
# 10000      1   52
# 10000      1   52
# 10000      1   52
# 10000      1   52
# 100        3   14
# 100        3   14
# 100        3   14
# 100        3   14
# 100        3   14
print(subject_df.loc[r_idxs])
#    Start  End
# 0      0  100
# 1      1  101
# 2      2  102
# 3      3  103
# 4      4  104
# 0      0  100
# 1      1  101
# 2      2  102
# 3      3  103
# 4      4  104

# return intervals in python (slow/mem-consuming)
intervals = ncls.intervals()
intervals
# [(0, 100, 0), (1, 101, 1), (2, 102, 2), (3, 103, 3), (4, 104, 4)]

There is also an experimental floating point version of the NCLS called FNCLS. See the examples folder.

Benchmark

Test file of 100 million intervals (created by subsetting gencode gtf with replacement):

Library Function Time (s) Memory (GB)
bx-python build 161.7 2.5
ncls build 3.15 0.5
bx-python overlap 148.4 4.3
ncls overlap 7.2 0.5

Building is 50 times faster and overlap queries are 20 times faster. Memory usage is one fifth and one ninth.

Original paper

Alexander V. Alekseyenko, Christopher J. Lee; Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases, Bioinformatics, Volume 23, Issue 11, 1 June 2007, Pages 1386–1393, https://doi.org/10.1093/bioinformatics/btl647

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncls-0.0.68.tar.gz (483.1 kB view hashes)

Uploaded Source

Built Distributions

ncls-0.0.68-pp39-pypy39_pp73-win_amd64.whl (734.0 kB view hashes)

Uploaded PyPy Windows x86-64

ncls-0.0.68-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (824.8 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

ncls-0.0.68-pp39-pypy39_pp73-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (822.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-pp39-pypy39_pp73-macosx_10_9_x86_64.whl (762.5 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

ncls-0.0.68-pp38-pypy38_pp73-win_amd64.whl (734.6 kB view hashes)

Uploaded PyPy Windows x86-64

ncls-0.0.68-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (835.2 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

ncls-0.0.68-pp38-pypy38_pp73-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (826.9 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-pp38-pypy38_pp73-macosx_10_9_x86_64.whl (761.5 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

ncls-0.0.68-pp37-pypy37_pp73-win_amd64.whl (734.9 kB view hashes)

Uploaded PyPy Windows x86-64

ncls-0.0.68-pp37-pypy37_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (835.7 kB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

ncls-0.0.68-pp37-pypy37_pp73-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (827.5 kB view hashes)

Uploaded PyPy manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-pp37-pypy37_pp73-macosx_10_9_x86_64.whl (761.7 kB view hashes)

Uploaded PyPy macOS 10.9+ x86-64

ncls-0.0.68-cp311-cp311-win_amd64.whl (763.4 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

ncls-0.0.68-cp311-cp311-win32.whl (725.7 kB view hashes)

Uploaded CPython 3.11 Windows x86

ncls-0.0.68-cp311-cp311-musllinux_1_1_x86_64.whl (2.5 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ x86-64

ncls-0.0.68-cp311-cp311-musllinux_1_1_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.11 musllinux: musl 1.1+ i686

ncls-0.0.68-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

ncls-0.0.68-cp311-cp311-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-cp311-cp311-macosx_10_9_x86_64.whl (850.9 kB view hashes)

Uploaded CPython 3.11 macOS 10.9+ x86-64

ncls-0.0.68-cp310-cp310-win_amd64.whl (765.1 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

ncls-0.0.68-cp310-cp310-win32.whl (727.5 kB view hashes)

Uploaded CPython 3.10 Windows x86

ncls-0.0.68-cp310-cp310-musllinux_1_1_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ x86-64

ncls-0.0.68-cp310-cp310-musllinux_1_1_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.10 musllinux: musl 1.1+ i686

ncls-0.0.68-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

ncls-0.0.68-cp310-cp310-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-cp310-cp310-macosx_10_9_x86_64.whl (855.3 kB view hashes)

Uploaded CPython 3.10 macOS 10.9+ x86-64

ncls-0.0.68-cp39-cp39-win_amd64.whl (772.9 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

ncls-0.0.68-cp39-cp39-win32.whl (738.2 kB view hashes)

Uploaded CPython 3.9 Windows x86

ncls-0.0.68-cp39-cp39-musllinux_1_1_x86_64.whl (2.5 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ x86-64

ncls-0.0.68-cp39-cp39-musllinux_1_1_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

ncls-0.0.68-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.4 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

ncls-0.0.68-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-cp39-cp39-macosx_10_9_x86_64.whl (865.7 kB view hashes)

Uploaded CPython 3.9 macOS 10.9+ x86-64

ncls-0.0.68-cp38-cp38-win_amd64.whl (773.1 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

ncls-0.0.68-cp38-cp38-win32.whl (738.2 kB view hashes)

Uploaded CPython 3.8 Windows x86

ncls-0.0.68-cp38-cp38-musllinux_1_1_x86_64.whl (2.5 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ x86-64

ncls-0.0.68-cp38-cp38-musllinux_1_1_i686.whl (2.4 MB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

ncls-0.0.68-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.5 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

ncls-0.0.68-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (2.3 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-cp38-cp38-macosx_10_9_x86_64.whl (852.0 kB view hashes)

Uploaded CPython 3.8 macOS 10.9+ x86-64

ncls-0.0.68-cp37-cp37m-win_amd64.whl (767.6 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

ncls-0.0.68-cp37-cp37m-win32.whl (731.8 kB view hashes)

Uploaded CPython 3.7m Windows x86

ncls-0.0.68-cp37-cp37m-musllinux_1_1_x86_64.whl (2.3 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ x86-64

ncls-0.0.68-cp37-cp37m-musllinux_1_1_i686.whl (2.2 MB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

ncls-0.0.68-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.3 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

ncls-0.0.68-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.manylinux_2_17_i686.manylinux2014_i686.whl (2.2 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686 manylinux: glibc 2.17+ i686

ncls-0.0.68-cp37-cp37m-macosx_10_9_x86_64.whl (851.6 kB view hashes)

Uploaded CPython 3.7m macOS 10.9+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page