Skip to main content

Tools for storing embeddings in a database and querying them

Project description

Vectorium

This Python package provides a simple and flexible interface for creating, manipulating and querying vector databases. Vectorium makes it easy to add or remove vectors, compare them using various similarity metrics, and perform various aggregate operations.

Features

  • Flexible Vector Database: Store and manage your high dimensional vectors in an efficient manner.
  • Multiple Compare Functions: Includes functions like cosine similarity, euclidean distance, dot product.
  • Various Aggregate Functions: Supports aggregation of result vectors using mean, sum, max, min or no operation.
  • Vector Operations: Add or remove vectors from your database, save or load your vector collection, and update your vector database as needed.

Installation

This package is not yet available on PyPi. Please clone this repository to your local machine and import the VectorDatabase class.

Usage

This is a brief example of how to use the VectorDatabase class:

from vectorium import VectorDatabase
import numpy as np

# Create a new database named 'my_collection'
db = VectorDatabase('my_collection', dim=128)

# Add a new vector associated with the key 'my_key'
db.add('my_key', np.random.randn(128))

# Compare an input vector with the database
results = db.compare(np.random.randn(128), func='cosine', aggregate='mean')

# Remove a key from the database
db.remove('my_key')

# Save the database
db.save()

# Load the database
db.load('my_collection_path')

Class: VectorDatabase

Parameters

  • collection - The name of the database. Will be used as a filename when saving/loading.
  • dim (optional) - The dimensions of the vectors. If None, will be inferred from the first added vector.
  • collection_path (optional) - The path where the database file will be stored.

Methods

add(key, vec)

Add a new vector associated with the given key to the database. If the key already exists, the vector will be appended to the existing ones.

remove(key)

Removes the vectors associated with the given key from the database.

compare(input_vector, func='cosine', aggregate='mean')

Compares an input vector with the vectors in the database using the given compare function (default is 'cosine') and returns the results aggregated using the given aggregate function (default is 'mean').

topk(input_vector, k=10, func='cosine', aggregate='mean', reverse=False)

Similar to compare, but only returns the top k results.

reset()

Empties the database.

save()

Saves the database to a .npz file with the name of the collection.

load(collection_path)

Loads the database from a .npz file located at the given path.

update()

Updates the internal list representation of the database. This is called automatically after each add, remove, reset, load and save operation.

Requirements

  • Python 3.6+
  • NumPy
  • PyTorch

Contribution

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectorium-0.1.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

vectorium-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file vectorium-0.1.0.tar.gz.

File metadata

  • Download URL: vectorium-0.1.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for vectorium-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7cd016f921002867342cf7e42e1f8aaafa16cb8b85fda6398bd600b39d06f613
MD5 4c1f3bce5f21dbc4c009131a5ac2b84a
BLAKE2b-256 002ac26a990c9181b17f565d031eb7d9ecaff70610527a63a73662af258400d1

See more details on using hashes here.

File details

Details for the file vectorium-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vectorium-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for vectorium-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 72f5f7e7f2f03b8b4e45e8e77839caeec4aa7cfeeca36ffb8985b933812cfd02
MD5 299b0dbe1d5b386caa22ce12084a6646
BLAKE2b-256 5d63243cc0d7ba7e87843b356169b2d3129b8b62b69aef5012015cff385cec8e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page