Skip to main content

A dict with a vector index for fast lookup of nearest neighbors

Project description

vdict

GitHub tests PyPI version MIT license

This a very thin wrapper around hnswlib to make it look like a python dictionary whose keys are numpy arrays. Install with pip install vdict.

from vdict import vdict
import numpy as np

data = vdict()
v1 = np.random.rand(32)
v2 = np.random.rand(32)
data[v1] = 'hello'
data[v2] = 32
assert data[v1] == 'hello'

You can have it throw IndexErrors if you try to access a key that doesn't exist:

data = vdict(tol=0.001)
v1 = np.random.rand(32)
v2 = np.random.rand(32)
data[v1] = 'hello'
# this will throw an IndexError because we didn't add yet!
print(data[v2])

The default tolerance is 1 (generally do not throw errors), but you can set it to a smaller value to make it more strict.

Details

  • All vectors must be the same length
  • Accessing with a vector gives the closest value keyed by the closest vector
  • The algorithm is approximate nearest neighbor search. You can tune the accuracy (see below)
  • You can have millions of vectors in the dictionary
  • If you know the approximate size, pass est_nelements to vidct() to reduce how often things are resized

Usage

The vdict class has some reasonable defaults, but you may need to tune for your use case. These are adjustable in the constructor. You can read about the parameters at the hnswlib. Briefly, the most important ones are:

  • M - the number of neighbors to consider when building the graph (higher M means more accurate, but more memory). 12-48 is typical.
  • space - the distance metric to use. The default is l2, but you can also use cosine or ip (inner product).
  • ef_construction - parameter that controls speed/accuracy trade-off during the index construction - 50 - 200 is typical.
from vdict import vdict
data = vdict(M=16, space='cosine', ef_construction=100)

# add some vectors
data[np.random.rand(32)] = 'hello'
data[np.random.rand(32)] = 'world'

License

MIT

Author

Andrew White

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdict-0.1.0.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

vdict-0.1.0-py3-none-any.whl (4.3 kB view details)

Uploaded Python 3

File details

Details for the file vdict-0.1.0.tar.gz.

File metadata

  • Download URL: vdict-0.1.0.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for vdict-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4f8d2083b6587b682b6dfc930f183c8ffd3da4e918982c3835854059f220ff83
MD5 68f3d3b00387d08df73835f8ef750157
BLAKE2b-256 13d4e051236105e6a265613c0b3aea3cdb24e0cf65d63b3a1a6cb5887a72b7bf

See more details on using hashes here.

File details

Details for the file vdict-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vdict-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for vdict-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e39bc767f5be37ed680c2a19e47fdd0c8c2d4c6d514e00ec4bb07e64fbc7f4f
MD5 28f6e85d6138cc2129e6997a651acdb5
BLAKE2b-256 7d95d73121e8bea5fa3566f4253fd2fe09c42fbaa3e3dd4f0c7c3ad30b994e74

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page