Nearest Neighbor Detection for Bioconductor
Project description
Python bindings to knncolle
Overview
The knncolle Python package implements Python bindings to the C++ library of the same name for nearest neighbor (NN) searches. Downstream packages can re-use the NN search algorithms in knncolle, either via Python or by directly calling C++ through shared pointers. This is inspired by the BiocNeighbors Bioconductor package, which does the same thing for R packages.
Quick start
Install it:
pip install knncolle
And run the desired search:
# Mocking up data with 20 dimensions, 1000 observations
import numpy
y = numpy.random.rand(20, 1000)
# Building a search index with vantage point trees:
import knncolle
params = knncolle.VptreeParameters()
idx = knncolle.build_index(params, y)
# Performing the search:
res = knncolle.find_knn(idx, num_neighbors=10)
res.index
res.distance
Check out the reference documentation for details.
Switching algorithms
We can easily switch to a different algorithm by just passing a different params object.
For example, we could use the Approximate Nearest Neighbors Oh Yeah (Annoy) algorithm:
an_params = knncolle.AnnoyParameters()
an_idx = knncolle.build_index(an_params, y)
We can also tweak the search parameters in our Parameters object, during or after construction.
For example, with the hierarchical navigable small worlds (HNSW) algorithm:
h_params = knncolle.HnswParameters(num_links=20, distance="Manhattan")
h_params.ef_construction = 150
h_idx = knncolle.build_index(h_params, y)
Currently, we support Annoy, HNSW, vantage point trees, k-means k-nearest neighbors, and (for testing) an exhaustive brute-force search. More algorithms can be added by extending knncolle as described below without any change to end-user code.
Other searches
Given a query dataset, we can find the nearest neighbors in the prebuilt search index:
q = numpy.random.rand(20, 50)
qres = knncolle.query_knn(idx, q, num_neighbors=10)
qres.index
qres.distance
We can ask find_knn() to report variable numbers of neighbors for each observation:
variable_k = (numpy.random.rand(y.shape[1]) * 10).astype(numpy.uint32)
var_res = knncolle.find_knn(idx, num_neighbors=variable_k)
var_res.index
var_res.distance
We can find all observations within a distance threshold of each observation via find_neighbors().
This also supports a variable threshold for each observation as well as querying of observations in a separate dataset.
range_res = knncolle.find_neighbors(idx, threshold=10)
range_res.index
range_res.distance
Use with C++
The raison d'être of the knncolle Python package is to enable re-use within (pybind11-wrapped) C++ code in other Python packages. The idea is that downstream packages will link against the knncolle C++ interface so that they can re-use the search indices created by the knncolle Python package. This allows downstream packages to (i) save time by avoiding the need to re-compile all algorithms and (ii) support more algorithms in knncolle extensions. To do so:
- Add
assorthead.includes()to the compiler's include path for your package. This can be done throughinclude_dirs=of theExtension()definition insetup.pyor by adding atarget_include_directories()in CMake, depending on your build system. - Call
knncolle.build_index()to construct aGenericIndexinstance. This exposes a shared pointer to the C++-allocated index via itsptrproperty. - Pass
ptrto pybind11-wrapped C++ code as a shared pointer to aknncolle::Prebuilt, which can be interrogated as described in the knncolle documentation.
So, for example, the C++ code in our downstream package might look like this:
int do_something(
const std::shared_ptr<
knncolle::Prebuilt<uint32_t, uint32_t, double>
>& index)
{
// Do something with the search index interface.
return 1;
}
PYBIND11_MODULE(lib_downstream, m) {
m.def("do_something", &do_something);
}
Which can then be called from Python:
from . import lib_downstream as lib
from knncolle import GenericIndex
def do_something(idx: GenericIndex):
return lib.do_something(idx.ptr)
In some scenarios, it may be more convenient to construct the search index inside C++,
e.g., if the dataset to be searched is not available before the call to the C++ function.
This can be accommodated by accepting a shared pointer to a knncolle::Builder in the C++ code:
int do_something_mk2(
const std::shared_ptr<
knncolle::Builder<
knncolle::SimpleMatrix<uint32_t, uint32_t, double>,
double
>
>& builder)
{
// The builder is a algorithm-specific factory that accepts a matrix and
// returns a search index for that algorithm. Presumably we construct a
// new search index inside this function and use it.
return 1;
}
PYBIND11_MODULE(lib_downstream, m) {
m.def("do_something_mk2", &do_something_mk2);
}
A pointer to the knncolle::Builder is then be created in Python by the define_builder() function, and then passed to C++:
from . import lib_downstream as lib
from knncolle import define_builder, Parameters
def do_something(param: Parameters):
builder, cls = define_builder(param)
return lib.do_something_mk2(builder.ptr)
See also the definitions in lib/src/def.h for the types of the pointers to be used in the C++ code.
Extending to more algorithms
Via define_builder()
The best way to extend knncolle is to do so in C++.
This involves writing subclasses of the Builder, Prebuilt and Searcher interfaces in the knncolle library.
Once this is done, it is a simple matter of writing the following Python bindings:
- Implement a
SomeNewParametersclass that inherits fromParameters. - Implement a
SomeNewIndexclass that inherits fromGenericIndex. This should accept a singleptrin its constructor and have aptrproperty that returns the same value. - Register a
define_builder()method that dispatches onSomeNewParameters. This should call into C++ and return a tuple containing a pybind11-wrappedBuilderPointerand theSomeNewIndexconstructor.
No new methods are required for find_knn(), build_index(), etc. as the default method will work automatically if a define_builder() method is available.
This approach also allows the new method to be used in C++ code of downstream packages that accept a PrebuiltPointer instance.
Without define_builder()
If it is not possible to implement the search algorithm in C++, we can still extend knncolle in Python. Each extension package should:
- Implement a
SomeNewParametersclass that inherits fromParameters. - Implement a
SomeNewIndexclass that inherits fromIndex. This can have an arbitrary structure, i.e., it does not need to have aptrproperty. - Register a
build_index()method that dispatches onSomeNewParameters. This should return an instance ofSomeNewIndex. - Register a method for any number of these generics:
find_knn(),find_distance(),find_neighbors(),query_knn(),query_distance(),query_neighbors(). These methods should dispatch onSomeNewParametersand return the appropriate result object.
This approach will not support re-use by C++ code in other Python packages.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knncolle-0.1.0.tar.gz.
File metadata
- Download URL: knncolle-0.1.0.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1755287908c0e52d952733b8a630304ba3d8837ccb84bb207d075e0594e23be1
|
|
| MD5 |
d299466f1538f69a17606a0cc927413d
|
|
| BLAKE2b-256 |
190e15650aa4de4a160e7b9365057c3e71fbff438a5823e61af942b37edc0410
|
File details
Details for the file knncolle-0.1.0-cp313-cp313-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp313-cp313-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.13, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e42ff26c03c2e4e507a791def50eacc7f61086609e0afbb249e30271022d0a1
|
|
| MD5 |
dbb685780a495486199aa2aaf59b2b28
|
|
| BLAKE2b-256 |
88771f9318067d0453121efef67f990809ffdf88482178009e8b9c2843dcc3ea
|
File details
Details for the file knncolle-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 245.2 kB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a450c3949dc59bc36e2be3d9423cf7b6a156a6375f76805d6e98c83e673157b
|
|
| MD5 |
e41f5206b8d34de286eba180a0ef8c8b
|
|
| BLAKE2b-256 |
f98787e53d0e65c67d7d4e9fc81bed08f9786519fcf5c9268f95f98fc98d0d29
|
File details
Details for the file knncolle-0.1.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 189.7 kB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be54f94f693ca856f2fd96155ad5626573d25376d0d580d06e33bd5f64310dc0
|
|
| MD5 |
cfb8e79b8700678ea2d3b80f92d3212f
|
|
| BLAKE2b-256 |
18e38049ebda40878cc8ff89bb74b5951410fd9e89a96bfd89ed4ae7a461c937
|
File details
Details for the file knncolle-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp313-cp313-macosx_10_13_x86_64.whl
- Upload date:
- Size: 221.7 kB
- Tags: CPython 3.13, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
688c6750edb650292e76c8b050cf2c41cf838e680aeef50d9bfbcc001c693884
|
|
| MD5 |
900bdcbf694ef96812d5c008859205da
|
|
| BLAKE2b-256 |
2bca4e2dbe3d80f7b01568389f86aa4fa3eb65e02e1fc58e97da86aa52b7eb1c
|
File details
Details for the file knncolle-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp312-cp312-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.12, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0610fa85502c11449c91243d79e47a6e4cfac168ae042c008a451471a5ace0f4
|
|
| MD5 |
4d0e4d164816dbdfa1b200404635f5ff
|
|
| BLAKE2b-256 |
4576917c6ff5cd1beda0d04b59ad87d476dbd1cae58c52a38dd2bdf20381b59e
|
File details
Details for the file knncolle-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 245.2 kB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
beb80e6a5920d172ef279333dbe78efeeca0b2040156ad30b5b225205450f6be
|
|
| MD5 |
9f542bd0c99e6fd8500922f38b9851c4
|
|
| BLAKE2b-256 |
006f42c7450c1c340ddecc913853031a5b4642617a207404bc34a4c1e22c2c78
|
File details
Details for the file knncolle-0.1.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 189.7 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6d8483cdbb5a120e7b3dd69c75ec83461882d0f6a12fa8f9951df71c632b0c1
|
|
| MD5 |
b6abfbbccd84812cc3e201a6f5182a19
|
|
| BLAKE2b-256 |
9269555c2deec8922410af54ca7ca3b59c972f1352978ca434f95b2861c8974f
|
File details
Details for the file knncolle-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp312-cp312-macosx_10_13_x86_64.whl
- Upload date:
- Size: 221.6 kB
- Tags: CPython 3.12, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50ff5aa82efdc61ac9fc761d7d380c7aad45d7dd3d62a02e3a46aa7ef886a627
|
|
| MD5 |
88055cd286bfa42aaa58c91fba425aee
|
|
| BLAKE2b-256 |
a82e60c676612a2f2735363ec82c748cdac46aa5a025e8c2cd96eadc77a072de
|
File details
Details for the file knncolle-0.1.0-cp311-cp311-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp311-cp311-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.11, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f5e724e95c8fc7dd8f481585a4f794156862ae09c7464498582ed4e06aebf8e
|
|
| MD5 |
70db19aa005723c8699a56fa47e620f8
|
|
| BLAKE2b-256 |
66f950cccf143ac17d667ee4bd826b01729fed28a7004c65ca514bdbf519cd25
|
File details
Details for the file knncolle-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 246.2 kB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9940c05cc22659192355bc0d38924abcf64b60ea2f5dc4ed54d41ff4810850f
|
|
| MD5 |
3fd01a8794f1cfdf902e5fa8f0fbe177
|
|
| BLAKE2b-256 |
d3a5abb7357769579f1109e2952db37fd3ec3b720ed5804d433554fca6daefd2
|
File details
Details for the file knncolle-0.1.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 189.8 kB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dd7a438b1608d5d8796a4ed86d1453549f2f2a7b36424e312ad35997bd3ad28
|
|
| MD5 |
39a9fecb787e5c5b0d49c309c91e9204
|
|
| BLAKE2b-256 |
33bd5bb910dbe0d3f427316cf74963031be3c8892dcc013e223757d510a667ad
|
File details
Details for the file knncolle-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp311-cp311-macosx_10_9_x86_64.whl
- Upload date:
- Size: 221.3 kB
- Tags: CPython 3.11, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a83ada8ee8594d057b0eaaa6db7075ffd577438f0b93b5fada216dca5d9a680
|
|
| MD5 |
e76f5619f3c8c5381d4f4eaf8aac1da2
|
|
| BLAKE2b-256 |
a09c093ea77579dd5b6e035e86d255039e73d2f42bc05f18c4f5750a51720c48
|
File details
Details for the file knncolle-0.1.0-cp310-cp310-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp310-cp310-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.10, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b464b658b7f5f864173949ae41e65207c3561173b35d139327954d64d4f8fb1b
|
|
| MD5 |
ed50f61d88ac2320b4a9ddbbaf047251
|
|
| BLAKE2b-256 |
5745f2c7a330f2fbdb1cee683ee815877af56203ea8de446409d66e73c872c3b
|
File details
Details for the file knncolle-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 245.5 kB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67c8c0be90a06110e7c2618358c6972e116a3398c67adb5acbedcd44906aa5df
|
|
| MD5 |
2bae88e8edc22435445ee760bbc43ee3
|
|
| BLAKE2b-256 |
3b0c4de2fa9d1822e83c8c9b448ed0cd6771f22a1241c7185d030efec3d7e740
|
File details
Details for the file knncolle-0.1.0-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 188.5 kB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b27f474fc2c2de13eea812badcb943d2c6642ab291b287a8152d7262a316b01f
|
|
| MD5 |
181dbc207ed15e7ed5fea0c70e7acfd1
|
|
| BLAKE2b-256 |
174887afff068ea2b5b56c20a217e16798f9ac9411cfd4b4c02bb07e7feaf31c
|
File details
Details for the file knncolle-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp310-cp310-macosx_10_9_x86_64.whl
- Upload date:
- Size: 220.1 kB
- Tags: CPython 3.10, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a64a11a0dfd5b03ace44a48ccf93a7ba298edd3fb717d4ab6cf823a506200ef5
|
|
| MD5 |
74e1ae7549c6f337959b675cd226c209
|
|
| BLAKE2b-256 |
e699aef2b4f8f7ed3409a18e7723909e6252285f096893a2da408bf2da85f846
|
File details
Details for the file knncolle-0.1.0-cp39-cp39-musllinux_1_2_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp39-cp39-musllinux_1_2_x86_64.whl
- Upload date:
- Size: 1.2 MB
- Tags: CPython 3.9, musllinux: musl 1.2+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3da02d38593b6ba6b7cd57df4ef6a229e6631486983325f3c96c19c0268c3a38
|
|
| MD5 |
8f750460bb5b7bf1e71f07c02f5cb2ae
|
|
| BLAKE2b-256 |
d24a603e00eb9a5d1ab0c94b4ccec6065b998d7b24f602aa96ec329f263331dc
|
File details
Details for the file knncolle-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 246.0 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78039ecbce164a9fb40ff5c0d71d5607583382bfa6cd0dd7239f42b41d4f2998
|
|
| MD5 |
ce8d043aaa8d254a71cb5b2ea6864a06
|
|
| BLAKE2b-256 |
2e6aba1147f22aa02ac3a3b1da5b98a5df83e89961947383d147e50a602134d1
|
File details
Details for the file knncolle-0.1.0-cp39-cp39-macosx_11_0_arm64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 188.6 kB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
508d014115ffa9428b402465bea076a92d471e6112a31fe0826a552366ee1e1e
|
|
| MD5 |
e2426a0d2f7f77d4607103698d12886d
|
|
| BLAKE2b-256 |
234d12c4385c4f936bedd53b245053a4de8e0d07e5a2741d2d43e8a8f794eb15
|
File details
Details for the file knncolle-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl.
File metadata
- Download URL: knncolle-0.1.0-cp39-cp39-macosx_10_9_x86_64.whl
- Upload date:
- Size: 220.1 kB
- Tags: CPython 3.9, macOS 10.9+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1401b984213b76e9ebce6c181df026122b05c09d0b43d4ddb0d9368c5418a5b6
|
|
| MD5 |
9f3c265b18656717f0e9254d7c3593a1
|
|
| BLAKE2b-256 |
750ea6ceece44c11016e25e447f6ace70fd33005416b37669d797d1e81cfa75f
|