Skip to main content

A vector database management module for ThothAI Project

Project description

Thoth Vector Database Manager v2.0

A high-performance, Haystack-based vector database manager with support for multiple backends and local embedding capabilities.

๐Ÿค– MCP Server Support

This project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:

  • Context7: Enhanced context management
  • Serena: IDE assistance and development support

See docs/MCP_SETUP.md for details.

๐Ÿš€ Features

  • Multi-backend support: Qdrant, Weaviate, Chroma, PostgreSQL pgvector, Milvus, Pinecone
  • Haystack integration: Uses Haystack as an abstraction layer over vector stores
  • Local embeddings: Uses open-source Sentence Transformers for local embedding generation
  • Memory optimization: Lazy loading and efficient batch processing
  • API compatibility: Maintains backward compatibility with existing ThothVectorStore API
  • Type safety: Full type hints and Pydantic validation
  • Flexible deployment: Multiple modes (memory, filesystem, server) for different use cases
  • Production-ready: Comprehensive testing and robust error handling

๐Ÿ“ฆ Installation

โš ๏ธ IMPORTANT: Backend Compatibility

Weaviate and Milvus cannot be installed together due to conflicting gRPC requirements. Choose one of the following installation options:

Option 1: Milvus Configuration (Recommended)

# Basic installation with Milvus support
pip install thoth-vdbmanager[milvus]

# All backends except Weaviate (includes Milvus)
pip install thoth-vdbmanager[all]

Option 2: Weaviate Configuration

# Basic installation with Weaviate support
pip install thoth-vdbmanager[weaviate]

# All backends except Milvus (includes Weaviate)
pip install thoth-vdbmanager[all-with-weaviate]

Option 3: Safe Backends Only

# No gRPC conflicts (Qdrant, Chroma, pgvector, Pinecone)
pip install thoth-vdbmanager[all-safe]

# Individual backends
pip install thoth-vdbmanager[qdrant]
pip install thoth-vdbmanager[chroma]
pip install thoth-vdbmanager[pgvector]
pip install thoth-vdbmanager[pinecone]

๐Ÿ“– For detailed compatibility information, see Backend Compatibility Guide

๐Ÿ—๏ธ Architecture

The library is built on a clean architecture with:

  • Core: Base interfaces and document types
  • Adapters: Backend-specific implementations using Haystack
  • Factory: Unified creation interface
  • Compatibility: Legacy API support

๐Ÿš€ Quick Start

New API (Recommended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument, SqlDocument, HintDocument

# Create a vector store
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# Add documents
column_doc = ColumnNameDocument(
    table_name="users",
    column_name="email",
    original_column_name="user_email",
    column_description="User email address",
    value_description="Valid email format"
)

doc_id = store.add_column_description(column_doc)

# Search documents
results = store.search_similar(
    query="user email",
    doc_type="column_name",
    top_k=5
)

Legacy API (Backward Compatible)

from vdbmanager import ThothVectorStore

# Works exactly like before
store = ThothVectorStore(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# All existing methods work
doc_id = store.add_column_description(column_doc)
results = store.search_similar("user email", "column_name")

๐Ÿ”ง Configuration

Qdrant

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333,
    api_key="your-api-key",  # Optional
    embedding_dim=384,  # Optional
    hnsw_config={"m": 16, "ef_construct": 100}
)

Weaviate (Production-Ready with Docker)

Docker Setup (Recommended):

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    use_docker=True,
    docker_compose_file="docker-compose-weaviate.yml"
)

Manual Configuration:

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    timeout=30,
    skip_init_checks=False,  # Set to True if gRPC issues
    api_key="your-api-key"  # Optional
)

๐Ÿ“– See Weaviate Configuration Guide for detailed setup instructions

Chroma (Multiple Modes)

Memory Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="memory"  # Fast, isolated, no persistence
)

Filesystem Mode:

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="filesystem",
    persist_path="./chroma_db"
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=8000
)

๐Ÿ“– See Chroma Configuration Guide for detailed setup instructions

PostgreSQL pgvector

store = VectorStoreFactory.create(
    backend="pgvector",
    collection="my_table",
    connection_string="postgresql://user:pass@localhost:5432/dbname"
)

Milvus (Multiple Modes)

Lite Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="lite",
    connection_uri="./milvus.db"  # File-based storage
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=19530
)

๐Ÿ“– See Milvus Configuration Guide for detailed setup instructions

Pinecone

store = VectorStoreFactory.create(
    backend="pinecone",
    collection="my-index",
    api_key="your-api-key",
    environment="us-west1-gcp-free"
)

๐Ÿ“Š Performance Optimizations

Memory Usage

  • Lazy initialization: Embedders and connections are initialized on first use
  • Singleton pattern: Same configuration reuses existing instances
  • Batch processing: Efficient bulk operations

Performance Tuning

# Optimize for specific use cases
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="optimized",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",  # 384-dim, fast
    hnsw_config={"m": 32, "ef_construct": 200}  # Better search quality
)

๐Ÿงช Testing

# Run all tests
pytest

# Run specific backend tests
pytest tests/test_qdrant.py -v

# Run with coverage
pytest --cov=vdbmanager tests/

๐Ÿ“ˆ Migration Guide

From v1.x to v2.x

Simple Migration

# Old code (v1.x)
from vdbmanager import QdrantHaystackStore

store = QdrantHaystackStore(
    collection="my_docs",
    host="localhost",
    port=6333
)

# New code (v2.x) - fully compatible
from vdbmanager import QdrantHaystackStore  # Still works!

# Or use new API
from vdbmanager import VectorStoreFactory

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_docs",
    host="localhost",
    port=6333
)

Advanced Migration

# Old code
from vdbmanager import ThothVectorStore

# New code - same interface, better internals
from vdbmanager import ThothVectorStore  # Still works with warnings

# Recommended new approach
from vdbmanager import QdrantAdapter

store = QdrantAdapter(
    collection="my_docs",
    host="localhost",
    port=6333
)

๐Ÿ” API Reference

Core Classes

VectorStoreFactory

# Create store
store = VectorStoreFactory.create(backend, collection, **kwargs)

# From config
config = {"backend": "qdrant", "params": {...}}
store = VectorStoreFactory.from_config(config)

# List backends
backends = VectorStoreFactory.list_backends()

Document Types

  • ColumnNameDocument: Column metadata
  • SqlDocument: SQL examples
  • HintDocument: General hints

Methods

  • add_column_description(doc): Add column metadata
  • add_sql(doc): Add SQL example
  • add_hint(doc): Add hint
  • search_similar(query, doc_type, top_k=5, score_threshold=0.7): Semantic search
  • get_document(doc_id): Retrieve by ID
  • bulk_add_documents(docs): Batch insert
  • get_collection_info(): Get stats

๐Ÿ› Troubleshooting

Common Issues

Connection Errors

# Check service availability
import requests
requests.get("http://localhost:6333")  # Qdrant

Memory Issues

# Use smaller embedding model
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"  # 384-dim
)

Performance Issues

# Tune HNSW parameters
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    hnsw_config={"m": 16, "ef_construct": 100}
)

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

Directory structure

vdbmanager/ โ”œโ”€โ”€ core/ # Base interfaces and document types โ”‚ โ”œโ”€โ”€ base.py # Core document classes and interfaces โ”‚ โ””โ”€โ”€ init.py โ”œโ”€โ”€ adapters/ # Backend-specific implementations โ”‚ โ”œโ”€โ”€ haystack_adapter.py # Base Haystack adapter โ”‚ โ”œโ”€โ”€ qdrant_adapter.py # Qdrant implementation โ”‚ โ”œโ”€โ”€ weaviate_adapter.py # Weaviate implementation โ”‚ โ”œโ”€โ”€ chroma_adapter.py # Chroma implementation โ”‚ โ”œโ”€โ”€ pgvector_adapter.py # PostgreSQL pgvector โ”‚ โ”œโ”€โ”€ milvus_adapter.py # Milvus implementation โ”‚ โ””โ”€โ”€ pinecone_adapter.py # Pinecone implementation โ”œโ”€โ”€ factory.py # Unified creation interface โ”œโ”€โ”€ compat/ # Legacy compatibility layer โ”‚ โ”œโ”€โ”€ init.py โ”‚ โ””โ”€โ”€ thoth_vector_store.py โ””โ”€โ”€ init.py # Public API exports

NewAPI (reccomended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument

Create any backend

store = VectorStoreFactory.create( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Use optimized methods

doc_id = store.add_column_description(column_doc) results = store.search_similar("user email", "column_name")

Old API (Fully compatible)

from vdbmanager import ThothVectorStore # Works with warnings

Existing code continues to work

store = ThothVectorStore( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thoth_vdbmanager-0.2.24.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thoth_vdbmanager-0.2.24-py3-none-any.whl (29.4 kB view details)

Uploaded Python 3

File details

Details for the file thoth_vdbmanager-0.2.24.tar.gz.

File metadata

  • Download URL: thoth_vdbmanager-0.2.24.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for thoth_vdbmanager-0.2.24.tar.gz
Algorithm Hash digest
SHA256 9e8cb12749afeaf70f6df9af1c246492f1bf8df3cc9fdbf26df4d3d8ec5846d1
MD5 faaff8203f4c7abc59a8ff199f59c72e
BLAKE2b-256 a9bafc21a29136dc7c1cc5084528635a04d3499191be8ce795932a513c0f498d

See more details on using hashes here.

File details

Details for the file thoth_vdbmanager-0.2.24-py3-none-any.whl.

File metadata

File hashes

Hashes for thoth_vdbmanager-0.2.24-py3-none-any.whl
Algorithm Hash digest
SHA256 2a98e6323460680e8b9594fbdfcc00f49f98abc352cbcd8131d18a10ba4c7112
MD5 3d1d5ce88731f0a21397d4db7356d2e6
BLAKE2b-256 fe0bdf15d19a7e2e30c2d95ceba32e5c0ec6dcac9618582f43c738aad8a8d15a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page