Tools for storing embeddings in a database and querying them
Project description
Vectorium
This Python package provides a simple and flexible interface for creating, manipulating and querying vector databases. Vectorium makes it easy to add or remove vectors, compare them using various similarity metrics, and perform various aggregate operations.
Features
- Flexible Vector Database: Store and manage your high dimensional vectors in an efficient manner.
- Multiple Compare Functions: Includes functions like cosine similarity, euclidean distance, dot product.
- Various Aggregate Functions: Supports aggregation of result vectors using mean, sum, max, min or no operation.
- Vector Operations: Add or remove vectors from your database, save or load your vector collection, and update your vector database as needed.
Installation
This package is not yet available on PyPi. Please clone this repository to your local machine and import the VectorDatabase
class.
Usage
This is a brief example of how to use the VectorDatabase
class:
from vectorium import VectorDatabase
import numpy as np
# Create a new database named 'my_collection'
db = VectorDatabase('my_collection', dim=128)
# Add a new vector associated with the key 'my_key'
db.add('my_key', np.random.randn(128))
# Compare an input vector with the database
results = db.compare(np.random.randn(128), func='cosine', aggregate='mean')
# Remove a key from the database
db.remove('my_key')
# Save the database
db.save()
# Load the database
db.load('my_collection_path')
Class: VectorDatabase
Parameters
collection
- The name of the database. Will be used as a filename when saving/loading.dim
(optional) - The dimensions of the vectors. If None, will be inferred from the first added vector.collection_path
(optional) - The path where the database file will be stored.
Methods
add(key, vec)
Add a new vector associated with the given key to the database. If the key already exists, the vector will be appended to the existing ones.
remove(key)
Removes the vectors associated with the given key from the database.
compare(input_vector, func='cosine', aggregate='mean')
Compares an input vector with the vectors in the database using the given compare function (default is 'cosine') and returns the results aggregated using the given aggregate function (default is 'mean').
topk(input_vector, k=10, func='cosine', aggregate='mean', reverse=False)
Similar to compare
, but only returns the top k
results.
reset()
Empties the database.
save()
Saves the database to a .npz
file with the name of the collection.
load(collection_path)
Loads the database from a .npz
file located at the given path.
update()
Updates the internal list representation of the database. This is called automatically after each add
, remove
, reset
, load
and save
operation.
Requirements
- Python 3.6+
- NumPy
- PyTorch
Contribution
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file vectorium-0.1.0.tar.gz
.
File metadata
- Download URL: vectorium-0.1.0.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cd016f921002867342cf7e42e1f8aaafa16cb8b85fda6398bd600b39d06f613 |
|
MD5 | 4c1f3bce5f21dbc4c009131a5ac2b84a |
|
BLAKE2b-256 | 002ac26a990c9181b17f565d031eb7d9ecaff70610527a63a73662af258400d1 |
File details
Details for the file vectorium-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: vectorium-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 72f5f7e7f2f03b8b4e45e8e77839caeec4aa7cfeeca36ffb8985b933812cfd02 |
|
MD5 | 299b0dbe1d5b386caa22ce12084a6646 |
|
BLAKE2b-256 | 5d63243cc0d7ba7e87843b356169b2d3129b8b62b69aef5012015cff385cec8e |