Skip to main content

Nearest Neighbor Detection for Bioconductor

Project description

PyPI-Server Monthly Downloads Unit tests

Python bindings to knncolle

Overview

The knncolle Python package implements Python bindings to the C++ library of the same name for nearest neighbor (NN) searches. Downstream packages can re-use the NN search algorithms in knncolle, either via Python or by directly calling C++ through shared pointers. This is inspired by the BiocNeighbors Bioconductor package, which does the same thing for R packages.

Quick start

Install it:

pip install knncolle

And run the desired search:

# Mocking up data with 20 dimensions, 1000 observations
import numpy
y = numpy.random.rand(20, 1000) 

# Building a search index with vantage point trees:
import knncolle
params = knncolle.VptreeParameters()
idx = knncolle.build_index(params, y)

# Performing the search:
res = knncolle.find_knn(idx, num_neighbors=10)
res.index
res.distance

Check out the reference documentation for details.

Switching algorithms

We can easily switch to a different algorithm by just passing a different params object. For example, we could use the Approximate Nearest Neighbors Oh Yeah (Annoy) algorithm:

an_params = knncolle.AnnoyParameters()
an_idx = knncolle.build_index(an_params, y)

We can also tweak the search parameters in our Parameters object, during or after construction. For example, with the hierarchical navigable small worlds (HNSW) algorithm:

h_params = knncolle.HnswParameters(num_links=20, distance="Manhattan")
h_params.ef_construction = 150
h_idx = knncolle.build_index(h_params, y)

Currently, we support Annoy, HNSW, vantage point trees, k-means k-nearest neighbors, and (for testing) an exhaustive brute-force search. More algorithms can be added by extending knncolle as described below without any change to end-user code.

Other searches

Given a query dataset, we can find the nearest neighbors in the prebuilt search index:

q = numpy.random.rand(20, 50)
qres = knncolle.query_knn(idx, q, num_neighbors=10)
qres.index
qres.distance

We can ask find_knn() to report variable numbers of neighbors for each observation:

variable_k = (numpy.random.rand(y.shape[1]) * 10).astype(numpy.uint32)
var_res = knncolle.find_knn(idx, num_neighbors=variable_k)
var_res.index
var_res.distance

We can find all observations within a distance threshold of each observation via find_neighbors(). This also supports a variable threshold for each observation as well as querying of observations in a separate dataset.

range_res = knncolle.find_neighbors(idx, threshold=10)
range_res.index
range_res.distance

Use with C++

The raison d'être of the knncolle Python package is to enable re-use within (pybind11-wrapped) C++ code in other Python packages. The idea is that downstream packages will link against the knncolle C++ interface so that they can re-use the search indices created by the knncolle Python package. This allows downstream packages to (i) save time by avoiding the need to re-compile all algorithms and (ii) support more algorithms in knncolle extensions. To do so:

  1. Add assorthead.includes() to the compiler's include path for your package. This can be done through include_dirs= of the Extension() definition in setup.py or by adding a target_include_directories() in CMake, depending on your build system.
  2. Call knncolle.build_index() to construct a GenericIndex instance. This exposes a shared pointer to the C++-allocated index via its ptr property.
  3. Pass ptr to pybind11-wrapped C++ code as a shared pointer to a knncolle::Prebuilt, which can be interrogated as described in the knncolle documentation.

So, for example, the C++ code in our downstream package might look like this:

int do_something(
    const std::shared_ptr<
        knncolle::Prebuilt<uint32_t, uint32_t, double> 
    >& index) 
{
    // Do something with the search index interface.
    return 1;
}

PYBIND11_MODULE(lib_downstream, m) {
    m.def("do_something", &do_something);
}

Which can then be called from Python:

from . import lib_downstream as lib
from knncolle import GenericIndex

def do_something(idx: GenericIndex):
    return lib.do_something(idx.ptr)

In some scenarios, it may be more convenient to construct the search index inside C++, e.g., if the dataset to be searched is not available before the call to the C++ function. This can be accommodated by accepting a shared pointer to a knncolle::Builder in the C++ code:

int do_something_mk2(
    const std::shared_ptr<
        knncolle::Builder<
            knncolle::SimpleMatrix<uint32_t, uint32_t, double>,
            double
        >
    >& builder) 
{
    // The builder is a algorithm-specific factory that accepts a matrix and
    // returns a search index for that algorithm. Presumably we construct a
    // new search index inside this function and use it.
    return 1;
}

PYBIND11_MODULE(lib_downstream, m) {
    m.def("do_something_mk2", &do_something_mk2);
}

A pointer to the knncolle::Builder is then be created in Python by the define_builder() function, and then passed to C++:

from . import lib_downstream as lib
from knncolle import define_builder, Parameters

def do_something(param: Parameters):
    builder, cls = define_builder(param)
    return lib.do_something_mk2(builder.ptr)

See also the definitions in lib/src/def.h for the types of the pointers to be used in the C++ code.

Extending to more algorithms

Via define_builder()

The best way to extend knncolle is to do so in C++. This involves writing subclasses of the Builder, Prebuilt and Searcher interfaces in the knncolle library. Once this is done, it is a simple matter of writing the following Python bindings:

  • Implement a SomeNewParameters class that inherits from Parameters.
  • Implement a SomeNewIndex class that inherits from GenericIndex. This should accept a single ptr in its constructor and have a ptr property that returns the same value.
  • Register a define_builder() method that dispatches on SomeNewParameters. This should call into C++ and return a tuple containing a pybind11-wrapped BuilderPointer and the SomeNewIndex constructor.

No new methods are required for find_knn(), build_index(), etc. as the default method will work automatically if a define_builder() method is available. This approach also allows the new method to be used in C++ code of downstream packages that accept a PrebuiltPointer instance.

Without define_builder()

If it is not possible to implement the search algorithm in C++, we can still extend knncolle in Python. Each extension package should:

  • Implement a SomeNewParameters class that inherits from Parameters.
  • Implement a SomeNewIndex class that inherits from Index. This can have an arbitrary structure, i.e., it does not need to have a ptr property.
  • Register a build_index() method that dispatches on SomeNewParameters. This should return an instance of SomeNewIndex.
  • Register a method for any number of these generics: find_knn(), find_distance(), find_neighbors(), query_knn(), query_distance(), query_neighbors(). These methods should dispatch on SomeNewParameters and return the appropriate result object.

This approach will not support re-use by C++ code in other Python packages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knncolle-0.1.0.tar.gz (38.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

knncolle-0.1.0-cp313-cp313-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

knncolle-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (245.2 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

knncolle-0.1.0-cp313-cp313-macosx_11_0_arm64.whl (189.7 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

knncolle-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl (221.7 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

knncolle-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

knncolle-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (245.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

knncolle-0.1.0-cp312-cp312-macosx_11_0_arm64.whl (189.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

knncolle-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl (221.6 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

knncolle-0.1.0-cp311-cp311-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

knncolle-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (246.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

knncolle-0.1.0-cp311-cp311-macosx_11_0_arm64.whl (189.8 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

knncolle-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl (221.3 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

knncolle-0.1.0-cp310-cp310-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

knncolle-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (245.5 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

knncolle-0.1.0-cp310-cp310-macosx_11_0_arm64.whl (188.5 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

knncolle-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl (220.1 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

knncolle-0.1.0-cp39-cp39-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

knncolle-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (246.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

knncolle-0.1.0-cp39-cp39-macosx_11_0_arm64.whl (188.6 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

knncolle-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl (220.1 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file knncolle-0.1.0.tar.gz.

File metadata

  • Download URL: knncolle-0.1.0.tar.gz
  • Upload date:
  • Size: 38.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for knncolle-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1755287908c0e52d952733b8a630304ba3d8837ccb84bb207d075e0594e23be1
MD5 d299466f1538f69a17606a0cc927413d
BLAKE2b-256 190e15650aa4de4a160e7b9365057c3e71fbff438a5823e61af942b37edc0410

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 2e42ff26c03c2e4e507a791def50eacc7f61086609e0afbb249e30271022d0a1
MD5 dbb685780a495486199aa2aaf59b2b28
BLAKE2b-256 88771f9318067d0453121efef67f990809ffdf88482178009e8b9c2843dcc3ea

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1a450c3949dc59bc36e2be3d9423cf7b6a156a6375f76805d6e98c83e673157b
MD5 e41f5206b8d34de286eba180a0ef8c8b
BLAKE2b-256 f98787e53d0e65c67d7d4e9fc81bed08f9786519fcf5c9268f95f98fc98d0d29

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 be54f94f693ca856f2fd96155ad5626573d25376d0d580d06e33bd5f64310dc0
MD5 cfb8e79b8700678ea2d3b80f92d3212f
BLAKE2b-256 18e38049ebda40878cc8ff89bb74b5951410fd9e89a96bfd89ed4ae7a461c937

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 688c6750edb650292e76c8b050cf2c41cf838e680aeef50d9bfbcc001c693884
MD5 900bdcbf694ef96812d5c008859205da
BLAKE2b-256 2bca4e2dbe3d80f7b01568389f86aa4fa3eb65e02e1fc58e97da86aa52b7eb1c

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0610fa85502c11449c91243d79e47a6e4cfac168ae042c008a451471a5ace0f4
MD5 4d0e4d164816dbdfa1b200404635f5ff
BLAKE2b-256 4576917c6ff5cd1beda0d04b59ad87d476dbd1cae58c52a38dd2bdf20381b59e

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 beb80e6a5920d172ef279333dbe78efeeca0b2040156ad30b5b225205450f6be
MD5 9f542bd0c99e6fd8500922f38b9851c4
BLAKE2b-256 006f42c7450c1c340ddecc913853031a5b4642617a207404bc34a4c1e22c2c78

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e6d8483cdbb5a120e7b3dd69c75ec83461882d0f6a12fa8f9951df71c632b0c1
MD5 b6abfbbccd84812cc3e201a6f5182a19
BLAKE2b-256 9269555c2deec8922410af54ca7ca3b59c972f1352978ca434f95b2861c8974f

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 50ff5aa82efdc61ac9fc761d7d380c7aad45d7dd3d62a02e3a46aa7ef886a627
MD5 88055cd286bfa42aaa58c91fba425aee
BLAKE2b-256 a82e60c676612a2f2735363ec82c748cdac46aa5a025e8c2cd96eadc77a072de

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 7f5e724e95c8fc7dd8f481585a4f794156862ae09c7464498582ed4e06aebf8e
MD5 70db19aa005723c8699a56fa47e620f8
BLAKE2b-256 66f950cccf143ac17d667ee4bd826b01729fed28a7004c65ca514bdbf519cd25

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f9940c05cc22659192355bc0d38924abcf64b60ea2f5dc4ed54d41ff4810850f
MD5 3fd01a8794f1cfdf902e5fa8f0fbe177
BLAKE2b-256 d3a5abb7357769579f1109e2952db37fd3ec3b720ed5804d433554fca6daefd2

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0dd7a438b1608d5d8796a4ed86d1453549f2f2a7b36424e312ad35997bd3ad28
MD5 39a9fecb787e5c5b0d49c309c91e9204
BLAKE2b-256 33bd5bb910dbe0d3f427316cf74963031be3c8892dcc013e223757d510a667ad

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 9a83ada8ee8594d057b0eaaa6db7075ffd577438f0b93b5fada216dca5d9a680
MD5 e76f5619f3c8c5381d4f4eaf8aac1da2
BLAKE2b-256 a09c093ea77579dd5b6e035e86d255039e73d2f42bc05f18c4f5750a51720c48

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 b464b658b7f5f864173949ae41e65207c3561173b35d139327954d64d4f8fb1b
MD5 ed50f61d88ac2320b4a9ddbbaf047251
BLAKE2b-256 5745f2c7a330f2fbdb1cee683ee815877af56203ea8de446409d66e73c872c3b

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 67c8c0be90a06110e7c2618358c6972e116a3398c67adb5acbedcd44906aa5df
MD5 2bae88e8edc22435445ee760bbc43ee3
BLAKE2b-256 3b0c4de2fa9d1822e83c8c9b448ed0cd6771f22a1241c7185d030efec3d7e740

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b27f474fc2c2de13eea812badcb943d2c6642ab291b287a8152d7262a316b01f
MD5 181dbc207ed15e7ed5fea0c70e7acfd1
BLAKE2b-256 174887afff068ea2b5b56c20a217e16798f9ac9411cfd4b4c02bb07e7feaf31c

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 a64a11a0dfd5b03ace44a48ccf93a7ba298edd3fb717d4ab6cf823a506200ef5
MD5 74e1ae7549c6f337959b675cd226c209
BLAKE2b-256 e699aef2b4f8f7ed3409a18e7723909e6252285f096893a2da408bf2da85f846

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 3da02d38593b6ba6b7cd57df4ef6a229e6631486983325f3c96c19c0268c3a38
MD5 8f750460bb5b7bf1e71f07c02f5cb2ae
BLAKE2b-256 d24a603e00eb9a5d1ab0c94b4ccec6065b998d7b24f602aa96ec329f263331dc

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 78039ecbce164a9fb40ff5c0d71d5607583382bfa6cd0dd7239f42b41d4f2998
MD5 ce8d043aaa8d254a71cb5b2ea6864a06
BLAKE2b-256 2e6aba1147f22aa02ac3a3b1da5b98a5df83e89961947383d147e50a602134d1

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 508d014115ffa9428b402465bea076a92d471e6112a31fe0826a552366ee1e1e
MD5 e2426a0d2f7f77d4607103698d12886d
BLAKE2b-256 234d12c4385c4f936bedd53b245053a4de8e0d07e5a2741d2d43e8a8f794eb15

See more details on using hashes here.

File details

Details for the file knncolle-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 1401b984213b76e9ebce6c181df026122b05c09d0b43d4ddb0d9368c5418a5b6
MD5 9f3c265b18656717f0e9254d7c3593a1
BLAKE2b-256 750ea6ceece44c11016e25e447f6ace70fd33005416b37669d797d1e81cfa75f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page