VectorizeDB is a database for vectorized data and metadata, allowing for fast similarity search and retrieval.
Project description
Overview
VectorizeDB is a Python package designed for the efficient storage and retrieval of high-dimensional vectors. It’s particularly useful in applications like machine learning and information retrieval. The package utilizes hnswlib for fast approximate nearest neighbor searches and LMDB for scalable and reliable storage.
Installation
To install VectorizeDB, ensure you have Python 3.10 or higher. It can be installed via pip:
pip install vectorizedb
Usage
Initialization
from vectorizedb import Database
# Initialize a new database
db = Database(path="path/to/db", dim=128, readonly=False, similarity="cosine", max_elements=1000000)
Adding Data
import numpy as np
# Add a vector with an associated key
db.add(key="sample_key", vector=np.random.rand(128))
# Add a vector with metadata
db.add(key="another_key", vector=np.random.rand(128), metadata={"info": "sample metadata"})
# Another way to add data
db["yet_another_key"] = (np.random.rand(128), {"info": "sample metadata"})
Retrieving Data
# Retrieve vector and metadata by key
vector, metadata = db["sample_key"]
# Check if a key exists in the database
exists = "sample_key" in db
Iterating Through Data
# Iterate through all keys, vectors and metadata in the database
for key, vector, metadata in db:
print(key, metadata)
Deleting Data
# Delete a vector from the database by key
del db["sample_key"]
Searching
# Search for nearest neighbors of a vector
results = db.search(vector=np.random.rand(128), k=5)
for key, vector, distance, metadata in results:
print(key, distance, metadata)
Database Length
# Get the number of entries in the database
length = len(db)
License
VectorizeDB is released under the Apache License. For more details, see the LICENSE file included in the package.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for vectorizedb-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf7f5285c0e579d1335172a6835bc0b4bf32538f5d94c4799bff291eb6b1de69 |
|
MD5 | 692572224d4e670ccc5be3d657927c8a |
|
BLAKE2b-256 | 8596bb5cc61a48d787ea67d2ee56466ea2ebaf0ae786381d832b5a9b74afb431 |