A Python library to create vantage point trees (VP-trees)
Project description
py-vptree
A library to create vantage point trees in Python.
Installation
You can install this library from PyPI:
pip install py-vptree
[!WARNING] This library has only been tested with Python 3.11. It may not work with other versions of Python. Contributors are encouraged to test it with other versions of Python.
Usage
You can create a new VP tree with the VPTree
class:
from vptree import VPTree
tree = VPTree(
points=list(range(10000)),
dist_fn=lambda x, y: bin(x ^ y).count("1"),
)
Both arguments are optional. If the points
argument is not provided, an empty tree will be created. If the dist_fn
argument is not provided, the default hamming distance function will be used.
The points
arguments can be anything you want as long as it's measurable with the dist_fn
function. The same two dist_fn
arguments must return the same numeric distance.
The dist_fn
takes two points as arguments and returns a positive numeric distance. E.g.:
hamming = lambda x, y: bin(x ^ y).count("1")
Get all points in the tree
To get all points in the tree, you can use the .all()
method:
>>> sorted(list(tree.all()))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ...]
The method returns a generator that yields all points in the tree (unordered).
Insert a point to the tree
To insert a point to the tree, you can use the insert
method:
>>> tree.insert(2)
Remove a point from the tree
To remove a point from the tree, you can use the remove
method:
>>> tree.remove(2)
This method won't rebuild the tree, but instead it'll promote another node to cover the space left by the removed point. No errors will be raised if the point is not in the tree.
Count the number of points in the tree
To count the number of points in the tree, you can use the len
built-in function:
>>> len(tree)
10000
Search for the k-nearest neighbors
To get the k-nearest neighbors, you can use the knn
method:
>>> knn = tree.knn(query=2, k=5)
>>> knn
[(2, 0), (0, 1), (3, 1), (1, 2), (4, 2)]
The result will be a list of tuples, where the first element is the point and the second element is the distance from the query
.
Search within a radius
To search within a radius, you can use the within
method:
>>> within = tree.within(query=2, radius=2)
>>> within
[(66, 1), (3, 1), (6, 1), (18, 1), (2, 0), (34, 1), (10, 1), (0, 1)]
The result will be a list of tuples, where the first element is the point and the second element is the distance from the query
. The distance of the results will be less than the radius
(equal points aren't included).
Dislike knn
, this method will not return points ordered by distance. If you want to order the results by distance, you can use the sort
method:
>>> within.sort(key=lambda x: x[1])
>>> within
[(2, 0), (66, 1), (3, 1), (6, 1), (18, 1), (34, 1), (10, 1), (0, 1)]
Custom points and distance function
The examples above use plain integers and a simple hamming distance function, but you can use the tree however you need it.
Example using named tuples
import random
import collections
from vptree import VPTree
Item = collections.namedtuple("Item", ["id", "value"])
tree = VPTree(
points=[
Item(id=i, value=random.randint(0, 10000)) for i in range(10000)
],
dist_fn=lambda x, y: bin(x[1] ^ y[1]).count("1"),
)
tree.knn((2, 2), 5)
# [(Item(id=4885, value=8322), 2), (Item(id=3622, value=22), 2), (Item(id=8197, value=8195), 2), (Item(id=9380, value=4610), 2), (Item(id=984, value=7), 2)]
Example using euclidean distance
from math import sqrt
import random
import collections
from vptree import VPTree
Item = collections.namedtuple("Item", ["id", "value"])
tree = VPTree(
points=[
Item(id=i, value=random.uniform(0, 10000)) for i in range(10000)
],
dist_fn=lambda x, y: sqrt((x[1] - y[1]) ** 2),
)
tree.knn((2, 2), 5)
# [(Item(id=7562, value=235.7541538751584), 233.7541538751584), (Item(id=5077, value=235.89421426943758), 233.89421426943758), (Item(id=5772, value=235.92818023762007), 233.92818023762007), (Item(id=6621, value=236.29613677601412), 234.29613677601412), (Item(id=6293, value=238.94108967773886), 236.94108967773886)]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file py_vptree-1.0.0.tar.gz
.
File metadata
- Download URL: py_vptree-1.0.0.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7691868eedcad7cc70c865d8bc65813aa45e3e2ca1f469c1c3306b7179fb0d8c |
|
MD5 | 7ce147d23d049ecee559ebd4e7997f52 |
|
BLAKE2b-256 | 3e7cd5bd105ecf8fe8b78671741a968567f77517f048ed8d0b557caaf0e7e576 |
File details
Details for the file py_vptree-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: py_vptree-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.6 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cdfe7b61814b85b2618bd6f0f62c095e080595d1b0161558fca431eae3776326 |
|
MD5 | 576d2051e074f578776b2a1102ef9fee |
|
BLAKE2b-256 | 14716e2c80ec4f82e1c829f7a015eb88cc076f8f48729a37732a3edd76afd9a8 |