Skip to main content

Nearest Neighbor Detection for Bioconductor

Project description

PyPI-Server Monthly Downloads Unit tests

Python bindings to knncolle

Overview

The knncolle Python package implements Python bindings to the C++ library of the same name for nearest neighbor (NN) searches. Downstream packages can re-use the NN search algorithms in knncolle, either via Python or by directly calling C++ through shared pointers. This is inspired by the BiocNeighbors Bioconductor package, which does the same thing for R packages.

Quick start

Install it:

pip install knncolle

And run the desired search:

# Mocking up data with 20 dimensions, 1000 observations
import numpy
y = numpy.random.rand(20, 1000) 

# Building a search index with vantage point trees:
import knncolle
params = knncolle.VptreeParameters()
idx = knncolle.build_index(params, y)

# Performing the search:
res = knncolle.find_knn(idx, num_neighbors=10)
res.index
res.distance

Check out the reference documentation for details.

Switching algorithms

We can easily switch to a different algorithm by just passing a different params object. For example, we could use the Approximate Nearest Neighbors Oh Yeah (Annoy) algorithm:

an_params = knncolle.AnnoyParameters()
an_idx = knncolle.build_index(an_params, y)

We can also tweak the search parameters in our Parameters object, during or after construction. For example, with the hierarchical navigable small worlds (HNSW) algorithm:

h_params = knncolle.HnswParameters(num_links=20, distance="Manhattan")
h_params.ef_construction = 150
h_idx = knncolle.build_index(h_params, y)

Currently, we support Annoy, HNSW, vantage point trees, k-means k-nearest neighbors, and (for testing) an exhaustive brute-force search. More algorithms can be added by extending knncolle as described below without any change to end-user code.

Other searches

Given a query dataset, we can find the nearest neighbors in the prebuilt search index:

q = numpy.random.rand(20, 50)
qres = knncolle.query_knn(idx, q, num_neighbors=10)
qres.index
qres.distance

We can ask find_knn() to report variable numbers of neighbors for each observation:

variable_k = (numpy.random.rand(y.shape[1]) * 10).astype(numpy.uint32)
var_res = knncolle.find_knn(idx, num_neighbors=variable_k)
var_res.index
var_res.distance

We can find all observations within a distance threshold of each observation via find_neighbors(). This also supports a variable threshold for each observation as well as querying of observations in a separate dataset.

range_res = knncolle.find_neighbors(idx, threshold=10)
range_res.index
range_res.distance

Use with C++

The raison d'être of the knncolle Python package is to enable re-use within (pybind11-wrapped) C++ code in other Python packages. The idea is that downstream packages will link against the knncolle C++ interface so that they can re-use the search indices created by the knncolle Python package. This allows downstream packages to (i) save time by avoiding the need to re-compile all algorithms and (ii) support more algorithms in knncolle extensions. To do so:

  1. Add assorthead.includes() to the compiler's include path for your package. This can be done through include_dirs= of the Extension() definition in setup.py or by adding a target_include_directories() in CMake, depending on your build system.
  2. Call knncolle.build_index() to construct a GenericIndex instance. This exposes a shared pointer to the C++-allocated index via its ptr property.
  3. Pass ptr to pybind11-wrapped C++ code as a shared pointer to a knncolle::Prebuilt, which can be interrogated as described in the knncolle documentation.

So, for example, the C++ code in our downstream package might look like this:

int do_something(
    const std::shared_ptr<
        knncolle::Prebuilt<uint32_t, uint32_t, double> 
    >& index) 
{
    // Do something with the search index interface.
    return 1;
}

PYBIND11_MODULE(lib_downstream, m) {
    m.def("do_something", &do_something);
}

Which can then be called from Python:

from . import lib_downstream as lib
from knncolle import GenericIndex

def do_something(idx: GenericIndex):
    return lib.do_something(idx.ptr)

In some scenarios, it may be more convenient to construct the search index inside C++, e.g., if the dataset to be searched is not available before the call to the C++ function. This can be accommodated by accepting a shared pointer to a knncolle::Builder in the C++ code:

int do_something_mk2(
    const std::shared_ptr<
        knncolle::Builder<
            knncolle::SimpleMatrix<uint32_t, uint32_t, double>,
            double
        >
    >& builder) 
{
    // The builder is a algorithm-specific factory that accepts a matrix and
    // returns a search index for that algorithm. Presumably we construct a
    // new search index inside this function and use it.
    return 1;
}

PYBIND11_MODULE(lib_downstream, m) {
    m.def("do_something_mk2", &do_something_mk2);
}

A pointer to the knncolle::Builder is then be created in Python by the define_builder() function, and then passed to C++:

from . import lib_downstream as lib
from knncolle import define_builder, Parameters

def do_something(param: Parameters):
    builder, cls = define_builder(param)
    return lib.do_something_mk2(builder.ptr)

See also the definitions in lib/src/def.h for the types of the pointers to be used in the C++ code.

Extending to more algorithms

Via define_builder()

The best way to extend knncolle is to do so in C++. This involves writing subclasses of the Builder, Prebuilt and Searcher interfaces in the knncolle library. Once this is done, it is a simple matter of writing the following Python bindings:

  • Implement a SomeNewParameters class that inherits from Parameters.
  • Implement a SomeNewIndex class that inherits from GenericIndex. This should accept a single ptr in its constructor and have a ptr property that returns the same value.
  • Register a define_builder() method that dispatches on SomeNewParameters. This should call into C++ and return a tuple containing a pybind11-wrapped BuilderPointer and the SomeNewIndex constructor.

No new methods are required for find_knn(), build_index(), etc. as the default method will work automatically if a define_builder() method is available. This approach also allows the new method to be used in C++ code of downstream packages that accept a PrebuiltPointer instance.

Without define_builder()

If it is not possible to implement the search algorithm in C++, we can still extend knncolle in Python. Each extension package should:

  • Implement a SomeNewParameters class that inherits from Parameters.
  • Implement a SomeNewIndex class that inherits from Index. This can have an arbitrary structure, i.e., it does not need to have a ptr property.
  • Register a build_index() method that dispatches on SomeNewParameters. This should return an instance of SomeNewIndex.
  • Register a method for any number of these generics: find_knn(), find_distance(), find_neighbors(), query_knn(), query_distance(), query_neighbors(). These methods should dispatch on SomeNewParameters and return the appropriate result object.

This approach will not support re-use by C++ code in other Python packages.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

knncolle-0.1.1.tar.gz (39.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

knncolle-0.1.1-cp313-cp313-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.13musllinux: musl 1.2+ x86-64

knncolle-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (230.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

knncolle-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (175.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

knncolle-0.1.1-cp313-cp313-macosx_10_13_x86_64.whl (204.7 kB view details)

Uploaded CPython 3.13macOS 10.13+ x86-64

knncolle-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.12musllinux: musl 1.2+ x86-64

knncolle-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (229.4 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

knncolle-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (175.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

knncolle-0.1.1-cp312-cp312-macosx_10_13_x86_64.whl (204.6 kB view details)

Uploaded CPython 3.12macOS 10.13+ x86-64

knncolle-0.1.1-cp311-cp311-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.11musllinux: musl 1.2+ x86-64

knncolle-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (231.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

knncolle-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (175.7 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

knncolle-0.1.1-cp311-cp311-macosx_10_9_x86_64.whl (205.0 kB view details)

Uploaded CPython 3.11macOS 10.9+ x86-64

knncolle-0.1.1-cp310-cp310-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.10musllinux: musl 1.2+ x86-64

knncolle-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (228.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

knncolle-0.1.1-cp310-cp310-macosx_11_0_arm64.whl (174.3 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

knncolle-0.1.1-cp310-cp310-macosx_10_9_x86_64.whl (203.6 kB view details)

Uploaded CPython 3.10macOS 10.9+ x86-64

knncolle-0.1.1-cp39-cp39-musllinux_1_2_x86_64.whl (1.2 MB view details)

Uploaded CPython 3.9musllinux: musl 1.2+ x86-64

knncolle-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (228.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

knncolle-0.1.1-cp39-cp39-macosx_11_0_arm64.whl (174.4 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

knncolle-0.1.1-cp39-cp39-macosx_10_9_x86_64.whl (203.7 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

File details

Details for the file knncolle-0.1.1.tar.gz.

File metadata

  • Download URL: knncolle-0.1.1.tar.gz
  • Upload date:
  • Size: 39.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for knncolle-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f41d3ccf6851cf8c8dc67e8fa51690e3450ff2746b3a0b49cd3bb79bffb9e216
MD5 039002a8a3078788e10a7a7158590ada
BLAKE2b-256 94416398b15f0424487bced2db7260cd9a24b3e0eb8cb6c3dff1653bb49060c3

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp313-cp313-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp313-cp313-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 35fc3433ea03fbc28571ed31b4fd4afd0a6604a44e71de4235d0ee5e4508720d
MD5 5d7ad3b5be9f0a1af844fa4cc31c4fcd
BLAKE2b-256 e12f20deb8470e8bc3b8a87f4769250a806ae06f5f9998141f323be406284e86

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4e7658ab50be818cc81b7c4ca0d6326f8ce02007dd9b21d8baa261bfccbdd324
MD5 55bf096cca14e24f739424ca83b46435
BLAKE2b-256 52c490599e5ac07124c46569cffb7a93831eafd2f318195ca0e7d89f13be7fab

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fc1958c28de0920ffe3f7310fc258a634901498c537e102e1d0f7e66bd5eb8c9
MD5 eee366c11d5c048f2910b495a69e8ea4
BLAKE2b-256 b7de52d5de5446c2632c5f41c3e048e0be8ef8ffd328e50fd25095dfe1d0547e

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp313-cp313-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp313-cp313-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 08dcc6f617f5a279403ba12483b69fa0cf65c17167360800479fd58b35ad901f
MD5 ac65a3dbfd15df502b47df363b1e6d66
BLAKE2b-256 480ee0fa2d25c96e802385c325a369fa8cfaeafc0f294117bcadda315886920b

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp312-cp312-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 0ff8f36d007792b4e37c8bd6e547aca7b7eac2a7c212fe305b1d88023c5e4a39
MD5 d7a881825bef940788e6b3a7a5098fc6
BLAKE2b-256 5efa24843674b72e6edc97add15170a81cd75b9d860deac4a5f8204eaa441147

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fb20dc4b8e384c2a19cf79593fbd535c9b4e0976fc7cb77dcc10f1678157c234
MD5 b97de3fde26dbb15406d314f28425ebe
BLAKE2b-256 d77394190c7ce2dcd4505e86f3f7312243723de776ee6938a80928e5f9654f5b

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1bd60127c3830a459e8269ec1c049e1c7dd7065e99c809fb8ad687bd7c83497c
MD5 d5716422adfb26eb7412cf66f853c1f6
BLAKE2b-256 067a13b2040e7c1766be8c3fc86ebe5559b40ef483c163ab3aff3a3ec1b00c81

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp312-cp312-macosx_10_13_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp312-cp312-macosx_10_13_x86_64.whl
Algorithm Hash digest
SHA256 68d4d6b67234ac87e997695c4b7b46699e158e08e71b43fdb7e6c39e5848f150
MD5 66899217894c4c2474c951392eac11e8
BLAKE2b-256 d2690987574894d2457027399b1bce4411b2349079761a43f148db1333297c47

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp311-cp311-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp311-cp311-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 d00ccc21d810d7c0ad4f4da4d1821aefcb07b0e85fba22c217a18c3c2fca7909
MD5 dc5b918225d99f8b67acf8f79fd17f95
BLAKE2b-256 a5edc02281b52696ccfd70fed3895be87e11e6c0c5d6b203b56ad134c71d251c

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a3df90a377438f4a7fd588de4c8c8861e10719390f129fa1d68b61fcfbf7b5c6
MD5 5e56d00c59ce1c1597552bf511404665
BLAKE2b-256 0ceb54903232883a8c0a5ab7cde04da2694ef6290e6d3af1a0cb8d67009b4d53

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 51da974f68c0bb6f8ab4c4757f247ba0a3ce97be284544e24a21dd2ac0ce3674
MD5 c4cf922122b3e1bb7f1aea6894450f0a
BLAKE2b-256 5ac6f6421e51f42467f3ad2fe5c529ac3f6208ef502c47248d4da2326b9db4d5

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp311-cp311-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp311-cp311-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 de81f8600da4cf43e71b9c1d7cb6af6020f8e87e389d5a2fa0faf26b40545829
MD5 64c5b91e6bb608ee1cbf80a388ed903a
BLAKE2b-256 1efb16e9f10469b1f709c3f40747be2eb15dbd24ea970edf90501ea935045038

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp310-cp310-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp310-cp310-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 6a13bfba5297c1a3fecb609370dca526efccdc86ec2dee7318584744eb14adf9
MD5 382d4a9289cd6384fe4d2cd787c6104b
BLAKE2b-256 517f0159aacf908f1540d2476a8a86aaf36e7e19eaa7ce1368e52f4bdbf7428b

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1dde181989760b74b0f852ef403f78bcd371c66b5e55f8c9d7b6cc6b011ca4f4
MD5 704114592f4975a4f7876eaf295b2104
BLAKE2b-256 49a1097d4a3915cdb3d72b23064d6126b8977eff67708b67731c0b2877874701

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5460756aaa82766be23dff036c8e10cfb837c5c33241fa17c6251c808965d463
MD5 9e6f3e050db2400928e382ae829478db
BLAKE2b-256 c263739cb7d40a8c6c5d6d313018af67316895b20ff05f77a648f66b82b0eb55

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp310-cp310-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp310-cp310-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 c10ba9ab06dfdc7ad2146e51224b973d70527b931ed1ab724bcb56a4e17a9abf
MD5 d4b3362df9d7074a0aa9b7fb0cceb005
BLAKE2b-256 739a3f74ddbf4924267ea7262da459c567317d910efac14980a8ffeab74e81d5

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp39-cp39-musllinux_1_2_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp39-cp39-musllinux_1_2_x86_64.whl
Algorithm Hash digest
SHA256 2420c47bfed5b27665381e9c5e595edac403cb7a6e17a95eeccf6d185e7f81c1
MD5 14df46acbd1184dc466745c280f62a6f
BLAKE2b-256 b7bd089e08f5901b6e214b6f813ba6314fd88af3640e5fe3c8dd9c52eb431f29

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a190d46f154f994978c4600b59ad17ba363b08f053842aafb3d7a8056eea44d8
MD5 baf92d0ed3f5ed37575e513a351fc140
BLAKE2b-256 49db79fda65529977dd1d6bd726cd3baefd392fdaf740f3df5872212a5c61a7d

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4c15a3eddf15afd869ff0dee0990f228af462985dd0603b7416d3fc1d287cfa2
MD5 65505e07dae5ec0d15007fea7acf7ce8
BLAKE2b-256 21c5d040dcb489c74c0457b46359473de33f88ee49a5f0c4d01144fedf6d0b5b

See more details on using hashes here.

File details

Details for the file knncolle-0.1.1-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

File hashes

Hashes for knncolle-0.1.1-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 49312c82e5ec02cbf767e9fdca27f12df1e63b348092eea822c0d78b53768eb5
MD5 4d0090463fa70a1b511f3792d7624d4b
BLAKE2b-256 423e2426305723b66d51e349964b24eecf59035376e5019ef260e517b43691aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page