Skip to main content

A vector database management module for ThothAI Project

Project description

Thoth Vector Database Manager v2.0

A high-performance, Haystack-based vector database manager with support for multiple backends and local embedding capabilities.

๐Ÿš€ Features

  • Multi-backend support: Qdrant, Weaviate, Chroma, PostgreSQL pgvector, Milvus, Pinecone
  • Haystack integration: Uses Haystack as an abstraction layer over vector stores
  • Local embeddings: Uses open-source Sentence Transformers for local embedding generation
  • Memory optimization: Lazy loading and efficient batch processing
  • API compatibility: Maintains backward compatibility with existing ThothVectorStore API
  • Type safety: Full type hints and Pydantic validation
  • Flexible deployment: Multiple modes (memory, filesystem, server) for different use cases
  • Production-ready: Comprehensive testing and robust error handling

๐Ÿ“ฆ Installation

โš ๏ธ IMPORTANT: Backend Compatibility

Weaviate and Milvus cannot be installed together due to conflicting gRPC requirements. Choose one of the following installation options:

Option 1: Milvus Configuration (Recommended)

# Basic installation with Milvus support
pip install thoth-vdbmanager[milvus]

# All backends except Weaviate (includes Milvus)
pip install thoth-vdbmanager[all]

Option 2: Weaviate Configuration

# Basic installation with Weaviate support
pip install thoth-vdbmanager[weaviate]

# All backends except Milvus (includes Weaviate)
pip install thoth-vdbmanager[all-with-weaviate]

Option 3: Safe Backends Only

# No gRPC conflicts (Qdrant, Chroma, pgvector, Pinecone)
pip install thoth-vdbmanager[all-safe]

# Individual backends
pip install thoth-vdbmanager[qdrant]
pip install thoth-vdbmanager[chroma]
pip install thoth-vdbmanager[pgvector]
pip install thoth-vdbmanager[pinecone]

๐Ÿ“– For detailed compatibility information, see Backend Compatibility Guide

๐Ÿ—๏ธ Architecture

The library is built on a clean architecture with:

  • Core: Base interfaces and document types
  • Adapters: Backend-specific implementations using Haystack
  • Factory: Unified creation interface
  • Compatibility: Legacy API support

๐Ÿš€ Quick Start

New API (Recommended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument, SqlDocument, HintDocument

# Create a vector store
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# Add documents
column_doc = ColumnNameDocument(
    table_name="users",
    column_name="email",
    original_column_name="user_email",
    column_description="User email address",
    value_description="Valid email format"
)

doc_id = store.add_column_description(column_doc)

# Search documents
results = store.search_similar(
    query="user email",
    doc_type="column_name",
    top_k=5
)

Legacy API (Backward Compatible)

from vdbmanager import ThothVectorStore

# Works exactly like before
store = ThothVectorStore(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# All existing methods work
doc_id = store.add_column_description(column_doc)
results = store.search_similar("user email", "column_name")

๐Ÿ”ง Configuration

Qdrant

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333,
    api_key="your-api-key",  # Optional
    embedding_dim=384,  # Optional
    hnsw_config={"m": 16, "ef_construct": 100}
)

Weaviate (Production-Ready with Docker)

Docker Setup (Recommended):

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    use_docker=True,
    docker_compose_file="docker-compose-weaviate.yml"
)

Manual Configuration:

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    timeout=30,
    skip_init_checks=False,  # Set to True if gRPC issues
    api_key="your-api-key"  # Optional
)

๐Ÿ“– See Weaviate Configuration Guide for detailed setup instructions

Chroma (Multiple Modes)

Memory Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="memory"  # Fast, isolated, no persistence
)

Filesystem Mode:

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="filesystem",
    persist_path="./chroma_db"
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=8000
)

๐Ÿ“– See Chroma Configuration Guide for detailed setup instructions

PostgreSQL pgvector

store = VectorStoreFactory.create(
    backend="pgvector",
    collection="my_table",
    connection_string="postgresql://user:pass@localhost:5432/dbname"
)

Milvus (Multiple Modes)

Lite Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="lite",
    connection_uri="./milvus.db"  # File-based storage
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=19530
)

๐Ÿ“– See Milvus Configuration Guide for detailed setup instructions

Pinecone

store = VectorStoreFactory.create(
    backend="pinecone",
    collection="my-index",
    api_key="your-api-key",
    environment="us-west1-gcp-free"
)

๐Ÿ“Š Performance Optimizations

Memory Usage

  • Lazy initialization: Embedders and connections are initialized on first use
  • Singleton pattern: Same configuration reuses existing instances
  • Batch processing: Efficient bulk operations

Performance Tuning

# Optimize for specific use cases
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="optimized",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",  # 384-dim, fast
    hnsw_config={"m": 32, "ef_construct": 200}  # Better search quality
)

๐Ÿงช Testing

# Run all tests
pytest

# Run specific backend tests
pytest tests/test_qdrant.py -v

# Run with coverage
pytest --cov=vdbmanager tests/

๐Ÿ“ˆ Migration Guide

From v1.x to v2.x

Simple Migration

# Old code (v1.x)
from vdbmanager import QdrantHaystackStore

store = QdrantHaystackStore(
    collection="my_docs",
    host="localhost",
    port=6333
)

# New code (v2.x) - fully compatible
from vdbmanager import QdrantHaystackStore  # Still works!

# Or use new API
from vdbmanager import VectorStoreFactory

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_docs",
    host="localhost",
    port=6333
)

Advanced Migration

# Old code
from vdbmanager import ThothVectorStore

# New code - same interface, better internals
from vdbmanager import ThothVectorStore  # Still works with warnings

# Recommended new approach
from vdbmanager import QdrantAdapter

store = QdrantAdapter(
    collection="my_docs",
    host="localhost",
    port=6333
)

๐Ÿ” API Reference

Core Classes

VectorStoreFactory

# Create store
store = VectorStoreFactory.create(backend, collection, **kwargs)

# From config
config = {"backend": "qdrant", "params": {...}}
store = VectorStoreFactory.from_config(config)

# List backends
backends = VectorStoreFactory.list_backends()

Document Types

  • ColumnNameDocument: Column metadata
  • SqlDocument: SQL examples
  • HintDocument: General hints

Methods

  • add_column_description(doc): Add column metadata
  • add_sql(doc): Add SQL example
  • add_hint(doc): Add hint
  • search_similar(query, doc_type, top_k=5, score_threshold=0.7): Semantic search
  • get_document(doc_id): Retrieve by ID
  • bulk_add_documents(docs): Batch insert
  • get_collection_info(): Get stats

๐Ÿ› Troubleshooting

Common Issues

Connection Errors

# Check service availability
import requests
requests.get("http://localhost:6333")  # Qdrant

Memory Issues

# Use smaller embedding model
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"  # 384-dim
)

Performance Issues

# Tune HNSW parameters
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    hnsw_config={"m": 16, "ef_construct": 100}
)

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

Directory structure

vdbmanager/ โ”œโ”€โ”€ core/ # Base interfaces and document types โ”‚ โ”œโ”€โ”€ base.py # Core document classes and interfaces โ”‚ โ””โ”€โ”€ init.py โ”œโ”€โ”€ adapters/ # Backend-specific implementations โ”‚ โ”œโ”€โ”€ haystack_adapter.py # Base Haystack adapter โ”‚ โ”œโ”€โ”€ qdrant_adapter.py # Qdrant implementation โ”‚ โ”œโ”€โ”€ weaviate_adapter.py # Weaviate implementation โ”‚ โ”œโ”€โ”€ chroma_adapter.py # Chroma implementation โ”‚ โ”œโ”€โ”€ pgvector_adapter.py # PostgreSQL pgvector โ”‚ โ”œโ”€โ”€ milvus_adapter.py # Milvus implementation โ”‚ โ””โ”€โ”€ pinecone_adapter.py # Pinecone implementation โ”œโ”€โ”€ factory.py # Unified creation interface โ”œโ”€โ”€ compat/ # Legacy compatibility layer โ”‚ โ”œโ”€โ”€ init.py โ”‚ โ””โ”€โ”€ thoth_vector_store.py โ””โ”€โ”€ init.py # Public API exports

NewAPI (reccomended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument

Create any backend

store = VectorStoreFactory.create( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Use optimized methods

doc_id = store.add_column_description(column_doc) results = store.search_similar("user email", "column_name")

Old API (Fully compatible)

from vdbmanager import ThothVectorStore # Works with warnings

Existing code continues to work

store = ThothVectorStore( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thoth_vdbmanager-0.2.15.tar.gz (33.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thoth_vdbmanager-0.2.15-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file thoth_vdbmanager-0.2.15.tar.gz.

File metadata

  • Download URL: thoth_vdbmanager-0.2.15.tar.gz
  • Upload date:
  • Size: 33.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for thoth_vdbmanager-0.2.15.tar.gz
Algorithm Hash digest
SHA256 7590f4f196c1fb5bd9f369321baae47e182566574c43b85cd244ff77be7ebc82
MD5 8106e3190c69860d70695e01c032985c
BLAKE2b-256 180f233f4fa4b08cc0edf8ff9e653366ee038fe8e186b3a8efd70a4060e54548

See more details on using hashes here.

File details

Details for the file thoth_vdbmanager-0.2.15-py3-none-any.whl.

File metadata

File hashes

Hashes for thoth_vdbmanager-0.2.15-py3-none-any.whl
Algorithm Hash digest
SHA256 7b41c3d2105c41fc3c5daac8123de0778a703eb4078c74eeae07bd2d1cdaf493
MD5 ee4744987729028c9bdf66fca2a4c257
BLAKE2b-256 f2c8a2a67669cb6c7926888e291d00f54893129786d665cbc4c6c6c5a7b85672

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page