A dict with a vector index for fast lookup of nearest neighbors
Project description
vdict
This a very thin wrapper around hnswlib to make it look like a python dictionary whose keys are numpy arrays. Install with pip install vdict
.
from vdict import vdict
import numpy as np
data = vdict()
v1 = np.random.rand(32)
v2 = np.random.rand(32)
data[v1] = 'hello'
data[v2] = 32
assert data[v1] == 'hello'
You can have it throw IndexErrors if you try to access a key that doesn't exist:
data = vdict(tol=0.001)
v1 = np.random.rand(32)
v2 = np.random.rand(32)
data[v1] = 'hello'
# this will throw an IndexError because we didn't add yet!
print(data[v2])
The default tolerance is 1
(generally do not throw errors), but you can set it to a smaller value to make it more strict.
Details
- All vectors must be the same length
- Accessing with a vector gives the closest value keyed by the closest vector
- The algorithm is approximate nearest neighbor search. You can tune the accuracy (see below)
- You can have millions of vectors in the dictionary
- If you know the approximate size, pass
est_nelements
tovidct()
to reduce how often things are resized
Usage
The vdict
class has some reasonable defaults, but you may need to tune for your use case. These are adjustable in the constructor. You can read about the parameters at the hnswlib. Briefly,
the most important ones are:
M
- the number of neighbors to consider when building the graph (higherM
means more accurate, but more memory). 12-48 is typical.space
- the distance metric to use. The default isl2
, but you can also usecosine
orip
(inner product).ef_construction
- parameter that controls speed/accuracy trade-off during the index construction - 50 - 200 is typical.
from vdict import vdict
data = vdict(M=16, space='cosine', ef_construction=100)
# add some vectors
data[np.random.rand(32)] = 'hello'
data[np.random.rand(32)] = 'world'
License
MIT
Author
Andrew White
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vdict-0.1.0.tar.gz
.
File metadata
- Download URL: vdict-0.1.0.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f8d2083b6587b682b6dfc930f183c8ffd3da4e918982c3835854059f220ff83 |
|
MD5 | 68f3d3b00387d08df73835f8ef750157 |
|
BLAKE2b-256 | 13d4e051236105e6a265613c0b3aea3cdb24e0cf65d63b3a1a6cb5887a72b7bf |
File details
Details for the file vdict-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: vdict-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e39bc767f5be37ed680c2a19e47fdd0c8c2d4c6d514e00ec4bb07e64fbc7f4f |
|
MD5 | 28f6e85d6138cc2129e6997a651acdb5 |
|
BLAKE2b-256 | 7d95d73121e8bea5fa3566f4253fd2fe09c42fbaa3e3dd4f0c7c3ad30b994e74 |