Skip to main content

Simple Wrapper for vector database in Python with minimal support for CRUD and retrieve.

Project description

PyVectorDB

Born to be simple. Simple Python wrapper provides efficient support for CRUD operations and querying with vector databases.".

GitHub license GitHub stars GitHub forks GitHub watchers GitHub issues GitHub pull requests Contributors GitHub last commit Commit activity GitHub repo size GitHub languages GitHub languages count


🚀 Getting Started

Installation

To install all vector database support depedencies (require a lot of disk space, not recommended)

pip install pyvectordb[all]

if you only need a specific vector database engine, you can use (recommended)

pip install pyvectordb[pgvector]
pip install pyvectordb[qdrant]
pip install pyvectordb[chromadb]
pip install pyvectordb[milvus]
pip install pyvectordb[weaviate]

Usage examples

1. PGVector

PGvector is an extension for PostgreSQL that allows the storage, indexing, and querying of vector embeddings. It is designed to support vector similarity search, which is useful in machine learning applications like natural language processing, image recognition, and recommendation systems. By storing vector embeddings as a data type, PGvector enables efficient similarity searches using distance metrics such as cosine similarity, Euclidean distance, inner product, etc.

from dotenv import load_dotenv
load_dotenv()

import os
from pyvectordb import Vector
from pyvectordb.pgvector.pgvector import PgvectorDB
from pyvectordb.distance_function import DistanceFunction

v1 = Vector(
    embedding=[2., 2., 1.],
    metadata={"text": "hellow from pyvectordb"}
)
v2 = Vector(
    embedding=[2., 2., 2.],
    metadata={"text": "hi"}
)
v3 = Vector(
    embedding=[2., 2., 3.],
    metadata={"text": "good morning!"}
)

vector_db = PgvectorDB(
    user=os.getenv("PG_USER"),
    password=os.getenv("PG_PASSWORD"),
    host=os.getenv("PG_HOST"),
    port=os.getenv("PG_PORT"),
    db_name=os.getenv("PG_NAME"),
    collection=os.getenv("PG_COLLECTION"),
    distance_function=DistanceFunction.L2,
)

# insert new vector
vector_db.insert_vector(v1)
vector_db.insert_vectors([v2, v3])

# read v1
v_from_db = vector_db.read_vector(v1.get_id())

# update v1 embedding
new_embedding = [2., 2., 4.]
v_from_db.embedding = new_embedding
vector_db.update_vector(v_from_db)

# read updated embedding and check
v_from_db_updated = vector_db.read_vector(v1.get_id())
assert list(v_from_db_updated.embedding) == list(new_embedding), "updated embedding not equal"

# re-update v1 embedding to the v1, check
vector_db.update_vectors([v1, v2, v3])
re_updated_embedding = vector_db.read_vector(v1.get_id()).embedding
assert list(re_updated_embedding) == list(v1.embedding), "re-updated embedding not equal"

for x in vector_db.get_neighbor_vectors(v1, 3):
    print(f"{x}")

vector_db.delete_vector(v1.get_id())
vector_db.delete_vectors([v2, v3])

2. Qdrant

Qdrant “is a vector similarity search engine that provides a production-ready service with a convenient API to store, search, and manage points (i.e. vectors) with an additional payload.” You can think of the payloads as additional pieces of information that can help you hone in on your search and also receive useful information that you can give to your users.

Using Qdrant in pyvectordb is simple, you only need to change the client to QdrantDB

from pyvectordb import QdrantDB

vector_db = QdrantDB(
    host=os.getenv("Q_HOST"),
    api_key=os.getenv("Q_API_KEY"),
    port=os.getenv("Q_PORT"),
    collection=os.getenv("Q_COLLECTION"),
    vector_size=int(os.getenv("Q_SIZE")),
    distance_function=DistanceFunction.COSINE,
)

3. Chroma DB

Chroma is the AI-native open-source vector database. Chroma makes it easy to build LLM apps by making knowledge, facts, and skills pluggable for LLMs.

from pyvectordb import ChromaDB

vector_db = ChromaDB(
    host=os.getenv("CH_HOST"),
    port=os.getenv("CH_PORT"),
    auth_provider=os.getenv("CH_AUTH_PROVIDER"),
    auth_credentials=os.getenv("CH_AUTH_CREDENTIALS"),
    collection_name=os.getenv("CH_COLLECTION_NAME"),
    distance_function=DistanceFunction.L2,
)

4. Milvus

Milvus is an open-source vector database designed for efficient similarity search and AI applications. It provides high-performance vector storage and retrieval with support for various distance metrics.

from pyvectordb import MilvusDB

vector_db = MilvusDB(
    host=os.getenv("MILVUS_HOST"),
    port=int(os.getenv("MILVUS_PORT", 19530)),
    collection=os.getenv("MILVUS_COLLECTION"),
    vector_size=int(os.getenv("MILVUS_VECTOR_SIZE")),
    distance_function=DistanceFunction.COSINE,
)

5. Weaviate

Weaviate is an open-source, cloud-native vector database that stores data objects and vector embeddings, enabling efficient similarity search. It supports semantic search, hybrid search, and RAG (Retrieval Augmented Generation) workflows.

from pyvectordb import WeaviateDB

vector_db = WeaviateDB(
    host=os.getenv("WEAVIATE_HOST", "localhost"),
    port=int(os.getenv("WEAVIATE_PORT", 8080)),
    grpc_port=int(os.getenv("WEAVIATE_GRPC_PORT", 50051)),
    api_key=os.getenv("WEAVIATE_API_KEY"),
    collection=os.getenv("WEAVIATE_COLLECTION"),
    vector_size=int(os.getenv("WEAVIATE_VECTOR_SIZE")),
    distance_function=DistanceFunction.COSINE,
)

Available functions

These are available functions in this simple tool

def insert_vector(self, vector: Vector) -> None: ...
def insert_vectors(self, vectors: List[Vector]) -> None: ...
def read_vector(self, id: str) -> Vector | None: ...
def update_vector(self, vector: Vector) -> None: ...
def update_vectors(self, vectors: List[Vector]) -> None: ...
def delete_vector(self, id: str) -> None: ...
def delete_vectors(self, ids: Union[List[str], List[Vector]]) -> None: ...
def get_neighbor_vectors(self, vector: Vector, n: int) -> List[VectorDistance]: ...

💬 Support & Contact

If you have any questions, feedback, or need support, feel free to reach out:

📧 Email: My Email
🌐 GitHub Issues: Submit an Issue
💼 LinkedIn: LinkedIn Profile


🙏 Support the Project

If you find this project helpful, consider supporting it by:

  • ⭐ Starring this repository
  • 🍴 Forking the project and contributing
  • 🗨 Sharing your feedback or feature requests

Every contribution helps make the project better!

Thank you!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvectordb-0.1.8.tar.gz (111.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyvectordb-0.1.8-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file pyvectordb-0.1.8.tar.gz.

File metadata

  • Download URL: pyvectordb-0.1.8.tar.gz
  • Upload date:
  • Size: 111.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pyvectordb-0.1.8.tar.gz
Algorithm Hash digest
SHA256 3d5f0ac4c0efa734e892bc3f5c5e45a3ddcbd933d336af149df7bb29a8d6fa99
MD5 5f170be9341cafe021eeccc5b55250be
BLAKE2b-256 b92ec2938d6b3f0f2edc52a30b30f6d4783345c7191ffd4ddc7a5add04990e5a

See more details on using hashes here.

File details

Details for the file pyvectordb-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: pyvectordb-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 19.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for pyvectordb-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 aa79ba5aa973a9a23190fd2ea5b9e49ba90d58156d1a3484a358f2c0e0cc58ce
MD5 63ccd9ddc2887ecfe963a2d6f9fe6de4
BLAKE2b-256 1093ac97d261eef5b4497da3fa9ae588edb52ca6f7fb492c1f937ebe8e0509df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page