Skip to main content

A vector database management module for ThothAI Project

Project description

Thoth Vector Database Manager v2.0

A high-performance, Haystack-based vector database manager with support for multiple backends and local embedding capabilities.

๐Ÿค– MCP Server Support

This project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:

  • Context7: Enhanced context management
  • Serena: IDE assistance and development support

See docs/MCP_SETUP.md for details.

๐Ÿš€ Features

  • Multi-backend support: Qdrant, Weaviate, Chroma, PostgreSQL pgvector, Milvus, Pinecone
  • Haystack integration: Uses Haystack as an abstraction layer over vector stores
  • Local embeddings: Uses open-source Sentence Transformers for local embedding generation
  • Memory optimization: Lazy loading and efficient batch processing
  • API compatibility: Maintains backward compatibility with existing ThothVectorStore API
  • Type safety: Full type hints and Pydantic validation
  • Flexible deployment: Multiple modes (memory, filesystem, server) for different use cases
  • Production-ready: Comprehensive testing and robust error handling

๐Ÿ“ฆ Installation

โš ๏ธ IMPORTANT: Backend Compatibility

Weaviate and Milvus cannot be installed together due to conflicting gRPC requirements. Choose one of the following installation options:

Option 1: Milvus Configuration (Recommended)

# Basic installation with Milvus support
pip install thoth-vdbmanager[milvus]

# All backends except Weaviate (includes Milvus)
pip install thoth-vdbmanager[all]

Option 2: Weaviate Configuration

# Basic installation with Weaviate support
pip install thoth-vdbmanager[weaviate]

# All backends except Milvus (includes Weaviate)
pip install thoth-vdbmanager[all-with-weaviate]

Option 3: Safe Backends Only

# No gRPC conflicts (Qdrant, Chroma, pgvector, Pinecone)
pip install thoth-vdbmanager[all-safe]

# Individual backends
pip install thoth-vdbmanager[qdrant]
pip install thoth-vdbmanager[chroma]
pip install thoth-vdbmanager[pgvector]
pip install thoth-vdbmanager[pinecone]

๐Ÿ“– For detailed compatibility information, see Backend Compatibility Guide

๐Ÿ—๏ธ Architecture

The library is built on a clean architecture with:

  • Core: Base interfaces and document types
  • Adapters: Backend-specific implementations using Haystack
  • Factory: Unified creation interface
  • Compatibility: Legacy API support

๐Ÿš€ Quick Start

New API (Recommended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument, SqlDocument, HintDocument

# Create a vector store
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# Add documents
column_doc = ColumnNameDocument(
    table_name="users",
    column_name="email",
    original_column_name="user_email",
    column_description="User email address",
    value_description="Valid email format"
)

doc_id = store.add_column_description(column_doc)

# Search documents
results = store.search_similar(
    query="user email",
    doc_type="column_name",
    top_k=5
)

Legacy API (Backward Compatible)

from vdbmanager import ThothVectorStore

# Works exactly like before
store = ThothVectorStore(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# All existing methods work
doc_id = store.add_column_description(column_doc)
results = store.search_similar("user email", "column_name")

๐Ÿ”ง Configuration

Qdrant

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333,
    api_key="your-api-key",  # Optional
    embedding_dim=384,  # Optional
    hnsw_config={"m": 16, "ef_construct": 100}
)

Weaviate (Production-Ready with Docker)

Docker Setup (Recommended):

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    use_docker=True,
    docker_compose_file="docker-compose-weaviate.yml"
)

Manual Configuration:

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    timeout=30,
    skip_init_checks=False,  # Set to True if gRPC issues
    api_key="your-api-key"  # Optional
)

๐Ÿ“– See Weaviate Configuration Guide for detailed setup instructions

Chroma (Multiple Modes)

Memory Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="memory"  # Fast, isolated, no persistence
)

Filesystem Mode:

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="filesystem",
    persist_path="./chroma_db"
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=8000
)

๐Ÿ“– See Chroma Configuration Guide for detailed setup instructions

PostgreSQL pgvector

store = VectorStoreFactory.create(
    backend="pgvector",
    collection="my_table",
    connection_string="postgresql://user:pass@localhost:5432/dbname"
)

Milvus (Multiple Modes)

Lite Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="lite",
    connection_uri="./milvus.db"  # File-based storage
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=19530
)

๐Ÿ“– See Milvus Configuration Guide for detailed setup instructions

Pinecone

store = VectorStoreFactory.create(
    backend="pinecone",
    collection="my-index",
    api_key="your-api-key",
    environment="us-west1-gcp-free"
)

๐Ÿ“Š Performance Optimizations

Memory Usage

  • Lazy initialization: Embedders and connections are initialized on first use
  • Singleton pattern: Same configuration reuses existing instances
  • Batch processing: Efficient bulk operations

Performance Tuning

# Optimize for specific use cases
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="optimized",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",  # 384-dim, fast
    hnsw_config={"m": 32, "ef_construct": 200}  # Better search quality
)

๐Ÿงช Testing

# Run all tests
pytest

# Run specific backend tests
pytest tests/test_qdrant.py -v

# Run with coverage
pytest --cov=vdbmanager tests/

๐Ÿ“ˆ Migration Guide

From v1.x to v2.x

Simple Migration

# Old code (v1.x)
from vdbmanager import QdrantHaystackStore

store = QdrantHaystackStore(
    collection="my_docs",
    host="localhost",
    port=6333
)

# New code (v2.x) - fully compatible
from vdbmanager import QdrantHaystackStore  # Still works!

# Or use new API
from vdbmanager import VectorStoreFactory

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_docs",
    host="localhost",
    port=6333
)

Advanced Migration

# Old code
from vdbmanager import ThothVectorStore

# New code - same interface, better internals
from vdbmanager import ThothVectorStore  # Still works with warnings

# Recommended new approach
from vdbmanager import QdrantAdapter

store = QdrantAdapter(
    collection="my_docs",
    host="localhost",
    port=6333
)

๐Ÿ” API Reference

Core Classes

VectorStoreFactory

# Create store
store = VectorStoreFactory.create(backend, collection, **kwargs)

# From config
config = {"backend": "qdrant", "params": {...}}
store = VectorStoreFactory.from_config(config)

# List backends
backends = VectorStoreFactory.list_backends()

Document Types

  • ColumnNameDocument: Column metadata
  • SqlDocument: SQL examples
  • HintDocument: General hints

Methods

  • add_column_description(doc): Add column metadata
  • add_sql(doc): Add SQL example
  • add_hint(doc): Add hint
  • search_similar(query, doc_type, top_k=5, score_threshold=0.7): Semantic search
  • get_document(doc_id): Retrieve by ID
  • bulk_add_documents(docs): Batch insert
  • get_collection_info(): Get stats

๐Ÿ› Troubleshooting

Common Issues

Connection Errors

# Check service availability
import requests
requests.get("http://localhost:6333")  # Qdrant

Memory Issues

# Use smaller embedding model
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"  # 384-dim
)

Performance Issues

# Tune HNSW parameters
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    hnsw_config={"m": 16, "ef_construct": 100}
)

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

Directory structure

vdbmanager/ โ”œโ”€โ”€ core/ # Base interfaces and document types โ”‚ โ”œโ”€โ”€ base.py # Core document classes and interfaces โ”‚ โ””โ”€โ”€ init.py โ”œโ”€โ”€ adapters/ # Backend-specific implementations โ”‚ โ”œโ”€โ”€ haystack_adapter.py # Base Haystack adapter โ”‚ โ”œโ”€โ”€ qdrant_adapter.py # Qdrant implementation โ”‚ โ”œโ”€โ”€ weaviate_adapter.py # Weaviate implementation โ”‚ โ”œโ”€โ”€ chroma_adapter.py # Chroma implementation โ”‚ โ”œโ”€โ”€ pgvector_adapter.py # PostgreSQL pgvector โ”‚ โ”œโ”€โ”€ milvus_adapter.py # Milvus implementation โ”‚ โ””โ”€โ”€ pinecone_adapter.py # Pinecone implementation โ”œโ”€โ”€ factory.py # Unified creation interface โ”œโ”€โ”€ compat/ # Legacy compatibility layer โ”‚ โ”œโ”€โ”€ init.py โ”‚ โ””โ”€โ”€ thoth_vector_store.py โ””โ”€โ”€ init.py # Public API exports

NewAPI (reccomended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument

Create any backend

store = VectorStoreFactory.create( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Use optimized methods

doc_id = store.add_column_description(column_doc) results = store.search_similar("user email", "column_name")

Old API (Fully compatible)

from vdbmanager import ThothVectorStore # Works with warnings

Existing code continues to work

store = ThothVectorStore( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thoth_vdbmanager-0.2.23.tar.gz (35.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thoth_vdbmanager-0.2.23-py3-none-any.whl (41.6 kB view details)

Uploaded Python 3

File details

Details for the file thoth_vdbmanager-0.2.23.tar.gz.

File metadata

  • Download URL: thoth_vdbmanager-0.2.23.tar.gz
  • Upload date:
  • Size: 35.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for thoth_vdbmanager-0.2.23.tar.gz
Algorithm Hash digest
SHA256 7a2fcdaa1f42e79b23a9aa928e7f052e7c7ac788050faaee10ca02a4fe5287d7
MD5 15ebd66c397e649a0de7e6db0b8c7d4f
BLAKE2b-256 126364ec9079c696e9f55af19d7cabba3829a6c2d3451361f348b3647bbb3534

See more details on using hashes here.

File details

Details for the file thoth_vdbmanager-0.2.23-py3-none-any.whl.

File metadata

File hashes

Hashes for thoth_vdbmanager-0.2.23-py3-none-any.whl
Algorithm Hash digest
SHA256 16c50797c00e2bfd7cce4dc6548e9750aeec5f5660bfd9df871fc1402fdcda11
MD5 353fd5167677d97671d149a7f4b9b736
BLAKE2b-256 1687d7140fd073b575829e773c85206cc1ebf262b57875be9fc2c372fd6ee304

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page