Lightweight Nearest Neighbors with Flexible Backends
Project description
Vicinity: The Lightweight Vector Store
Table of contents
Vicinity is the lightest-weight vector store. Just put in some vectors, calculate query vectors, and off you go. It provides a simple and intuitive API for nearest neighbor search, with support for different backends.
Quickstart
Install the package with:
pip install vicinity
The following code snippet demonstrates how to use Vicinity for nearest neighbor search:
import numpy as np
from vicinity import Vicinity
from vicinity.datatypes import Backend
# Create some dummy data
items = ["triforce", "master sword", "hylian shield", "boomerang", "hookshot"]
vectors = np.random.rand(len(items), 128)
# Initialize the Vicinity instance (using the basic backend)
vicinity = Vicinity.from_vectors_and_items(vectors=vectors, items=items, backend_type=Backend.BASIC)
# Query for nearest neighbors with a top-k search
query_vector = np.random.rand(128)
results = vicinity.query([query_vector], k=3)
# Query for nearest neighbors with a threshold search
results = vicinity.query_threshold([query_vector], threshold=0.9)
# Save the vector store
vicinity.save('my_vector_store')
# Load the vector store
vicinity = Vicinity.load('my_vector_store')
Main Features
Vicinity provides the following features:
- Lightweight: Minimal dependencies and fast performance.
- Flexible Backend Support: Use different backends for vector storage and search.
- Serialization: Save and load vector stores for persistence.
- Easy to Use: Simple and intuitive API.
Supported Backends
The following backends are supported:
BASIC
: A simple flat index for vector storage and search.- HNSW: Hierarchical Navigable Small World Graph (HNSW) for ANN search using hnswlib.
- FAISS: ANN search using FAISS. All FAISS indexes are supported.
- ANNOY: "Approximate Nearest Neighbors Oh Yeah" for approximate nearest neighbor search.
- PYNNDescent: ANN search using PyNNDescent.
- USEARCH: ANN search using Usearch. This uses a highly optimized version of the HNSW algorithm.
NOTE: the ANN backends do not support dynamic deletion. To delete items, you need to recreate the index. Insertion is supported in the following backends: FAISS
, HNSW
, and Usearch
. The BASIC
backend supports both insertion and deletion.
Backend Parameters
Backend | Parameter | Description | Default Value |
---|---|---|---|
Annoy | metric |
Similarity metric to use (dot , euclidean , cosine ). |
"cosine" |
trees |
Number of trees to use for indexing. | 100 |
|
length |
Optional length of the dataset. | None |
|
FAISS | metric |
Similarity metric to use (cosine , l2 ). |
"cosine" |
index_type |
Type of FAISS index (flat , ivf , hnsw , lsh , scalar , pq , ivf_scalar , ivfpq , ivfpqr ). |
"hnsw" |
|
nlist |
Number of cells for IVF indexes. | 100 |
|
m |
Number of subquantizers for PQ and HNSW indexes. | 8 |
|
nbits |
Number of bits for LSH and PQ indexes. | 8 |
|
refine_nbits |
Number of bits for the refinement stage in IVFPQR indexes. | 8 |
|
HNSW | metric |
Similarity space to use (cosine , l2 ). |
"cosine" |
ef_construction |
Size of the dynamic list during index construction. | 200 |
|
m |
Number of connections per layer. | 16 |
|
PyNNDescent | metric |
Similarity metric to use (cosine , euclidean , manhattan ). |
"cosine" |
n_neighbors |
Number of neighbors to use for search. | 15 |
|
Usearch | metric |
Similarity metric to use (cos , ip , l2sq , hamming , tanimoto ). |
"cos" |
connectivity |
Number of connections per node in the graph. | 16 |
|
expansion_add |
Number of candidates considered during graph construction. | 128 |
|
expansion_search |
Number of candidates considered during search. | 64 |
Usage
Creating a Vector Store
You can create a Vicinity instance by providing items and their corresponding vectors:
from vicinity import Vicinity
import numpy as np
items = ["triforce", "master sword", "hylian shield", "boomerang", "hookshot"]
vectors = np.random.rand(len(items), 128)
vicinity = Vicinity.from_vectors_and_items(vectors=vectors, items=items)
Querying
Find the k nearest neighbors for a given vector:
query_vector = np.random.rand(128)
results = vicinity.query([query_vector], k=3)
Find all neighbors within a given threshold:
query_vector = np.random.rand(128)
results = vicinity.query_threshold([query_vector], threshold=0.9)
Inserting and Deleting Items
Insert new items:
new_items = ["ocarina", "bow"]
new_vectors = np.random.rand(2, 128)
vicinity.insert(new_items, new_vectors)
Delete items:
vicinity.delete(["hookshot"])
Saving and Loading
Save the vector store:
vicinity.save('my_vector_store')
Load the vector store:
vicinity = Vicinity.load('my_vector_store')
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vicinity-0.2.1.tar.gz
.
File metadata
- Download URL: vicinity-0.2.1.tar.gz
- Upload date:
- Size: 80.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4fb95c2b5c362d66b370655af587772a2af20ac6756ab3b64a7fb771a6e77f35 |
|
MD5 | 0ec64152ecf3839825f098e6a1597ebc |
|
BLAKE2b-256 | 2c448d208f9ffea7b40b88ecd3fb46d7465e345cc608e85114da069b3c2c01b7 |
File details
Details for the file vicinity-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: vicinity-0.2.1-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95763bbe7da3a41238bdf72705c254cc01894f8633d1e3ff768d4e7835f6c26c |
|
MD5 | a50aa0222abd2470752f5edc3ece6293 |
|
BLAKE2b-256 | 46fa01517c3eb4741ba7b55fd929bc99b7a8feef88cb715eda128af29f40b6ac |