Skip to main content

Lightweight Nearest Neighbors with Flexible Backends

Project description

Vicinity: The Lightweight Vector Store

Table of contents

Vicinity is the lightest-weight vector store. Just put in some vectors, calculate query vectors, and off you go. It provides a simple and intuitive API for nearest neighbor search, with support for different backends.

Quickstart

Install the package with:

pip install vicinity

The following code snippet demonstrates how to use Vicinity for nearest neighbor search:

import numpy as np
from vicinity import Vicinity
from vicinity.datatypes import Backend

# Create some dummy data
items = ["triforce", "master sword", "hylian shield", "boomerang", "hookshot"]
vectors = np.random.rand(len(items), 128)

# Initialize the Vicinity instance (using the basic backend)
vicinity = Vicinity.from_vectors_and_items(vectors=vectors, items=items, backend_type=Backend.BASIC)

# Query for nearest neighbors with a top-k search
query_vector = np.random.rand(128)
results = vicinity.query([query_vector], k=3)

# Query for nearest neighbors with a threshold search
results = vicinity.query_threshold([query_vector], threshold=0.9)

# Save the vector store
vicinity.save('my_vector_store')

# Load the vector store
vicinity = Vicinity.load('my_vector_store')

Main Features

Vicinity provides the following features:

  • Lightweight: Minimal dependencies and fast performance.
  • Flexible Backend Support: Use different backends for vector storage and search.
  • Serialization: Save and load vector stores for persistence.
  • Easy to Use: Simple and intuitive API.

Supported Backends

The following backends are supported:

  • BASIC: A simple flat index for vector storage and search.
  • HNSW: Hierarchical Navigable Small World Graph (HNSW) for ANN search using hnswlib.
  • FAISS: ANN search using FAISS. All FAISS indexes are supported.
  • ANNOY: "Approximate Nearest Neighbors Oh Yeah" for approximate nearest neighbor search.
  • PYNNDescent: ANN search using PyNNDescent.
  • USEARCH: ANN search using Usearch. This uses a highly optimized version of the HNSW algorithm.

NOTE: the ANN backends do not support dynamic deletion. To delete items, you need to recreate the index. Insertion is supported in the following backends: FAISS, HNSW, and Usearch. The BASIC backend supports both insertion and deletion.

Backend Parameters

Backend Parameter Description Default Value
Annoy metric Similarity metric to use (dot, euclidean, cosine). "cosine"
trees Number of trees to use for indexing. 100
length Optional length of the dataset. None
FAISS metric Similarity metric to use (cosine, l2). "cosine"
index_type Type of FAISS index (flat, ivf, hnsw, lsh, scalar, pq, ivf_scalar, ivfpq, ivfpqr). "hnsw"
nlist Number of cells for IVF indexes. 100
m Number of subquantizers for PQ and HNSW indexes. 8
nbits Number of bits for LSH and PQ indexes. 8
refine_nbits Number of bits for the refinement stage in IVFPQR indexes. 8
HNSW metric Similarity space to use (cosine, l2). "cosine"
ef_construction Size of the dynamic list during index construction. 200
m Number of connections per layer. 16
PyNNDescent metric Similarity metric to use (cosine, euclidean, manhattan). "cosine"
n_neighbors Number of neighbors to use for search. 15
Usearch metric Similarity metric to use (cos, ip, l2sq, hamming, tanimoto). "cos"
connectivity Number of connections per node in the graph. 16
expansion_add Number of candidates considered during graph construction. 128
expansion_search Number of candidates considered during search. 64

Usage

Creating a Vector Store

You can create a Vicinity instance by providing items and their corresponding vectors:

from vicinity import Vicinity
import numpy as np

items = ["triforce", "master sword", "hylian shield", "boomerang", "hookshot"]
vectors = np.random.rand(len(items), 128)

vicinity = Vicinity.from_vectors_and_items(vectors=vectors, items=items)
Querying

Find the k nearest neighbors for a given vector:

query_vector = np.random.rand(128)
results = vicinity.query([query_vector], k=3)

Find all neighbors within a given threshold:

query_vector = np.random.rand(128)
results = vicinity.query_threshold([query_vector], threshold=0.9)
Inserting and Deleting Items

Insert new items:

new_items = ["ocarina", "bow"]
new_vectors = np.random.rand(2, 128)
vicinity.insert(new_items, new_vectors)

Delete items:

vicinity.delete(["hookshot"])
Saving and Loading

Save the vector store:

vicinity.save('my_vector_store')

Load the vector store:

vicinity = Vicinity.load('my_vector_store')

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vicinity-0.2.1.tar.gz (80.7 kB view details)

Uploaded Source

Built Distribution

vicinity-0.2.1-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file vicinity-0.2.1.tar.gz.

File metadata

  • Download URL: vicinity-0.2.1.tar.gz
  • Upload date:
  • Size: 80.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for vicinity-0.2.1.tar.gz
Algorithm Hash digest
SHA256 4fb95c2b5c362d66b370655af587772a2af20ac6756ab3b64a7fb771a6e77f35
MD5 0ec64152ecf3839825f098e6a1597ebc
BLAKE2b-256 2c448d208f9ffea7b40b88ecd3fb46d7465e345cc608e85114da069b3c2c01b7

See more details on using hashes here.

File details

Details for the file vicinity-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: vicinity-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.14

File hashes

Hashes for vicinity-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 95763bbe7da3a41238bdf72705c254cc01894f8633d1e3ff768d4e7835f6c26c
MD5 a50aa0222abd2470752f5edc3ece6293
BLAKE2b-256 46fa01517c3eb4741ba7b55fd929bc99b7a8feef88cb715eda128af29f40b6ac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page