Skip to main content

Pre release of Vectrs, a decentralized and distributed vector database network

Project description

readme

Vectrs - Decentralized & Distributed Vector Database

Overview

Vectrs is a decentralized & distributed vector database designed for efficient storage and retrieval of vector embeddings. By utilizing commodity hardware and scaling horizontally, Vectrs offers a cost-effective solution compared to traditional centralized databases. It leverages a distributed hash table (DHT) for decentralized data management, ensuring scalability and fault tolerance.

Features

  • Distributed Storage: Data is distributed across multiple nodes for scalability and fault tolerance.
  • Cost-Effective: Utilizes commodity hardware to reduce costs.
  • Horizontal Scalability: Easily add more nodes to handle increased data load.
  • Efficient Vector Operations: Optimized for storing and querying vector embeddings.
  • OpenAI Integration: Supports storing and retrieving vector embeddings generated by OpenAI models.

Installation

You can install Vectrs from PyPI using pip:


pip install vectrs


Usage

Initializing a Vectrs Node

To initialize a Vectrs node, import the KademliaNode class and start the node:

import asyncio
from vectrs.network import KademliaNode
from vectrs.database import VectorDBManager

async def start_node():
    db_manager = VectorDBManager()
    node = KademliaNode(host='127.0.0.1', port=8468)
    node.set_local_db_manager(db_manager)
    await node.start()
    return node

if __name__ == "__main__":
    asyncio.run(start_node())


Adding and Querying Vectors

Adding Vectors

You can add vectors to the database by using the add\_vector method:

import numpy as np

async def add_vectors(node, db_id):
    vectors = {
        "vec1": np.random.rand(1024).astype(np.float32),
        "vec2": np.random.rand(1024).astype(np.float32),
    }
    metadata = "Example metadata"
    for vector_id, vector in vectors.items():
        await node.add_vector(db_id, vector_id, vector, metadata)
        print(f"Added vector with ID: {vector_id} and metadata: {metadata}")

if __name__ == "__main__":
    node = asyncio.run(start_node())
    db_id = node.local_db_manager.create_database(dim=1024)
    print(f"Created database with ID: {db_id}")
    asyncio.run(add_vectors(node, db_id))


Querying Vectors

You can query vectors from the database using the query\_vector method:

async def query_vector(node, db_id, vector_id):
    vector = await node.query_vector(db_id, vector_id)
    print(f"Retrieved vector: {vector}")

if __name__ == "__main__":
    node = asyncio.run(start_node())
    db_id = node.local_db_manager.create_database(dim=1024)
    print(f"Created database with ID: {db_id}")
    asyncio.run(query_vector(node, db_id, "vec1"))


Example with OpenAI Embeddings

You can store OpenAI-generated vector embeddings in Vectrs:

import openai
import numpy as np

openai.api_key = 'your_openai_api_key'

async def store_openai_embedding(node, db_id, text):
    response = openai.Embedding.create(input=[text], model="text-embedding-ada-002")
    vector = np.array(response['data'][0]['embedding'], dtype=np.float32)
    await node.add_vector(db_id, "openai_vec", vector, "OpenAI generated embedding")
    print("Stored OpenAI embedding")

if __name__ == "__main__":
    node = asyncio.run(start_node())
    db_id = node.local_db_manager.create_database(dim=1024)
    print(f"Created database with ID: {db_id}")
    asyncio.run(store_openai_embedding(node, db_id, "Example text for embedding"))


Retrieving Vector and Log Hashes

After adding a vector, you can retrieve its hash and log hash as follows:

async def add_vector_and_get_hashes(node, db_id):
    vector = np.random.rand(1024).astype(np.float32)
    vector_id = "vec1"
    metadata = "Example metadata"

    # Add the vector
    await node.add_vector(db_id, vector_id, vector, metadata)

    # Retrieve vector hash and log hash
    vector_hash = node.local_db_manager.get_vector_hash(db_id, vector_id)
    log_hash = node.local_db_manager.get_log_hash(db_id, vector_id)

    print(f"Added vector with ID: {vector_id} and metadata: {metadata}")
    print(f"Vector hash: {vector_hash}")
    print(f"Log hash: {log_hash}")

if __name__ == "__main__":
    node = asyncio.run(start_node())
    db_id = node.local_db_manager.create_database(dim=1024)
    print(f"Created database with ID: {db_id}")
    asyncio.run(add_vector_and_get_hashes(node, db_id))


API Reference

KademliaNode

Methods

  • start(): Starts the node and listens for connections.
  • stop(): Stops the node.
  • bootstrap(bootstrap\_host, bootstrap\_port): Bootstraps the node to an existing network.
  • add\_vector(db\_id, vector\_id, vector, metadata=None): Adds a vector to the database.
  • query\_vector(db\_id, vector\_id): Queries a vector from the database.
  • set\_local\_db\_manager(db\_manager): Sets the local database manager.
  • get\_value(key): Retrieves a value from the DHT by key.

VectorDBManager

Methods

  • create\_database(dim): Creates a new vector database with the specified dimensions and returns the database ID.
  • get\_database(db\_id): Retrieves a database by its ID.
  • get\_vector\_hash(db\_id, vector\_id): Retrieves the hash of a vector by its ID.
  • get\_log\_hash(db\_id, vector\_id): Retrieves the log hash of a vector by its ID.

Contribution

Contributions are welcome! Please open an issue or submit a pull request on GitHub.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Support

For support or inquiries, please contact sakib@paralex.tech

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vectrs-0.1.0.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vectrs-0.1.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file vectrs-0.1.0.tar.gz.

File metadata

  • Download URL: vectrs-0.1.0.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.5

File hashes

Hashes for vectrs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ab464dbb52e2bc832eb0361798fbcc422dc37984e7cd479d7e5a02763f311d29
MD5 a5e3c168fb0541cb85e957747aa6cece
BLAKE2b-256 b01bff220073d6075882e0512a919e741e8afe5712cfb199249f630bfa4c0e92

See more details on using hashes here.

File details

Details for the file vectrs-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: vectrs-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.8.5

File hashes

Hashes for vectrs-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d655dead1313f772bddfa0e7f683681c1ecc1ec1c5b8037f9b065284604ff1c
MD5 e1e50815caf2eb83cdc8286545faf8ee
BLAKE2b-256 076c64352e25f2fdc330bbed272352a39f5f86c14ab559648c1bd6a7d107ce64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page