Skip to main content

A vector database management module for ThothAI Project

Project description

Thoth Vector Database Manager v2.0

A high-performance, Haystack-based vector database manager with support for multiple backends and local embedding capabilities.

๐Ÿค– MCP Server Support

This project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:

  • Context7: Enhanced context management
  • Serena: IDE assistance and development support

See docs/MCP_SETUP.md for details.

๐Ÿš€ Features

  • Multi-backend support: Qdrant, Weaviate, Chroma, PostgreSQL pgvector, Milvus, Pinecone
  • Haystack integration: Uses Haystack as an abstraction layer over vector stores
  • Local embeddings: Uses open-source Sentence Transformers for local embedding generation
  • Memory optimization: Lazy loading and efficient batch processing
  • API compatibility: Maintains backward compatibility with existing ThothVectorStore API
  • Type safety: Full type hints and Pydantic validation
  • Flexible deployment: Multiple modes (memory, filesystem, server) for different use cases
  • Production-ready: Comprehensive testing and robust error handling

๐Ÿ“ฆ Installation

๐Ÿš€ Recommended: uv Package Manager

This project now uses uv for fast, reliable Python package management. Install uv first:

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

โš ๏ธ IMPORTANT: Backend Compatibility

Weaviate and Milvus cannot be installed together due to conflicting gRPC requirements. Choose one of the following installation options:

Option 1: Milvus Configuration (Recommended)

# Basic installation with Milvus support
uv add thoth-vdbmanager[milvus]

# All backends except Weaviate (includes Milvus)
uv add thoth-vdbmanager[all]

Option 2: Weaviate Configuration

# Basic installation with Weaviate support
uv add thoth-vdbmanager[weaviate]

# All backends except Milvus (includes Weaviate)
uv add thoth-vdbmanager[all-with-weaviate]

Option 3: Safe Backends Only

# No gRPC conflicts (Qdrant, Chroma, pgvector, Pinecone)
uv add thoth-vdbmanager[all-safe]

# Individual backends
uv add thoth-vdbmanager[qdrant]
uv add thoth-vdbmanager[chroma]
uv add thoth-vdbmanager[pgvector]
uv add thoth-vdbmanager[pinecone]

๐Ÿ”„ Legacy pip Installation (Still Supported)

If you prefer to use pip, all the above commands work by replacing uv add with pip install:

# Example with pip
pip install thoth-vdbmanager[all]

๐Ÿ“– For detailed compatibility information, see Backend Compatibility Guide

๐Ÿ—๏ธ Architecture

The library is built on a clean architecture with:

  • Core: Base interfaces and document types
  • Adapters: Backend-specific implementations using Haystack
  • Factory: Unified creation interface
  • Compatibility: Legacy API support

๐Ÿš€ Quick Start

New API (Recommended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument, SqlDocument, HintDocument

# Create a vector store
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# Add documents
column_doc = ColumnNameDocument(
    table_name="users",
    column_name="email",
    original_column_name="user_email",
    column_description="User email address",
    value_description="Valid email format"
)

doc_id = store.add_column_description(column_doc)

# Search documents
results = store.search_similar(
    query="user email",
    doc_type="column_name",
    top_k=5
)

Legacy API (Backward Compatible)

from vdbmanager import ThothVectorStore

# Works exactly like before
store = ThothVectorStore(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333
)

# All existing methods work
doc_id = store.add_column_description(column_doc)
results = store.search_similar("user email", "column_name")

๐Ÿ”ง Configuration

Qdrant

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333,
    api_key="your-api-key",  # Optional
    embedding_dim=384,  # Optional
    hnsw_config={"m": 16, "ef_construct": 100}
)

Weaviate (Production-Ready with Docker)

Docker Setup (Recommended):

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    use_docker=True,
    docker_compose_file="docker-compose-weaviate.yml"
)

Manual Configuration:

store = VectorStoreFactory.create(
    backend="weaviate",
    collection="MyCollection",
    url="http://localhost:8080",
    timeout=30,
    skip_init_checks=False,  # Set to True if gRPC issues
    api_key="your-api-key"  # Optional
)

๐Ÿ“– See Weaviate Configuration Guide for detailed setup instructions

Chroma (Multiple Modes)

Memory Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="memory"  # Fast, isolated, no persistence
)

Filesystem Mode:

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="filesystem",
    persist_path="./chroma_db"
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="chroma",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=8000
)

๐Ÿ“– See Chroma Configuration Guide for detailed setup instructions

PostgreSQL pgvector

store = VectorStoreFactory.create(
    backend="pgvector",
    collection="my_table",
    connection_string="postgresql://user:pass@localhost:5432/dbname"
)

Milvus (Multiple Modes)

Lite Mode (Recommended for Testing):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="lite",
    connection_uri="./milvus.db"  # File-based storage
)

Server Mode (Production):

store = VectorStoreFactory.create(
    backend="milvus",
    collection="my_collection",
    mode="server",
    host="localhost",
    port=19530
)

๐Ÿ“– See Milvus Configuration Guide for detailed setup instructions

Pinecone

store = VectorStoreFactory.create(
    backend="pinecone",
    collection="my-index",
    api_key="your-api-key",
    environment="us-west1-gcp-free"
)

๐Ÿ“Š Performance Optimizations

Memory Usage

  • Lazy initialization: Embedders and connections are initialized on first use
  • Singleton pattern: Same configuration reuses existing instances
  • Batch processing: Efficient bulk operations

Performance Tuning

# Optimize for specific use cases
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="optimized",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",  # 384-dim, fast
    hnsw_config={"m": 32, "ef_construct": 200}  # Better search quality
)

๐Ÿงช Testing

# Run all tests
pytest

# Run specific backend tests
pytest tests/test_qdrant.py -v

# Run with coverage
pytest --cov=vdbmanager tests/

๐Ÿ“ˆ Migration Guide

From v1.x to v2.x

Simple Migration

# Old code (v1.x)
from vdbmanager import QdrantHaystackStore

store = QdrantHaystackStore(
    collection="my_docs",
    host="localhost",
    port=6333
)

# New code (v2.x) - fully compatible
from vdbmanager import QdrantHaystackStore  # Still works!

# Or use new API
from vdbmanager import VectorStoreFactory

store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_docs",
    host="localhost",
    port=6333
)

Advanced Migration

# Old code
from vdbmanager import ThothVectorStore

# New code - same interface, better internals
from vdbmanager import ThothVectorStore  # Still works with warnings

# Recommended new approach
from vdbmanager import QdrantAdapter

store = QdrantAdapter(
    collection="my_docs",
    host="localhost",
    port=6333
)

๐Ÿ” API Reference

Core Classes

VectorStoreFactory

# Create store
store = VectorStoreFactory.create(backend, collection, **kwargs)

# From config
config = {"backend": "qdrant", "params": {...}}
store = VectorStoreFactory.from_config(config)

# List backends
backends = VectorStoreFactory.list_backends()

Document Types

  • ColumnNameDocument: Column metadata
  • SqlDocument: SQL examples
  • HintDocument: General hints

Methods

  • add_column_description(doc): Add column metadata
  • add_sql(doc): Add SQL example
  • add_hint(doc): Add hint
  • search_similar(query, doc_type, top_k=5, score_threshold=0.7): Semantic search
  • get_document(doc_id): Retrieve by ID
  • bulk_add_documents(docs): Batch insert
  • get_collection_info(): Get stats

๐Ÿ› Troubleshooting

Common Issues

Connection Errors

# Check service availability
import requests
requests.get("http://localhost:6333")  # Qdrant

Memory Issues

# Use smaller embedding model
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    embedding_model="sentence-transformers/all-MiniLM-L6-v2"  # 384-dim
)

Performance Issues

# Tune HNSW parameters
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    hnsw_config={"m": 16, "ef_construct": 100}
)

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details.

Directory structure

vdbmanager/ โ”œโ”€โ”€ core/ # Base interfaces and document types โ”‚ โ”œโ”€โ”€ base.py # Core document classes and interfaces โ”‚ โ””โ”€โ”€ init.py โ”œโ”€โ”€ adapters/ # Backend-specific implementations โ”‚ โ”œโ”€โ”€ haystack_adapter.py # Base Haystack adapter โ”‚ โ”œโ”€โ”€ qdrant_adapter.py # Qdrant implementation โ”‚ โ”œโ”€โ”€ weaviate_adapter.py # Weaviate implementation โ”‚ โ”œโ”€โ”€ chroma_adapter.py # Chroma implementation โ”‚ โ”œโ”€โ”€ pgvector_adapter.py # PostgreSQL pgvector โ”‚ โ”œโ”€โ”€ milvus_adapter.py # Milvus implementation โ”‚ โ””โ”€โ”€ pinecone_adapter.py # Pinecone implementation โ”œโ”€โ”€ factory.py # Unified creation interface โ”œโ”€โ”€ compat/ # Legacy compatibility layer โ”‚ โ”œโ”€โ”€ init.py โ”‚ โ””โ”€โ”€ thoth_vector_store.py โ””โ”€โ”€ init.py # Public API exports

NewAPI (reccomended)

from vdbmanager import VectorStoreFactory, ColumnNameDocument

Create any backend

store = VectorStoreFactory.create( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Use optimized methods

doc_id = store.add_column_description(column_doc) results = store.search_similar("user email", "column_name")

Old API (Fully compatible)

from vdbmanager import ThothVectorStore # Works with warnings

Existing code continues to work

store = ThothVectorStore( backend="qdrant", collection="my_docs", host="localhost", port=6333 )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thoth_vdbmanager-0.3.1.tar.gz (22.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thoth_vdbmanager-0.3.1-py3-none-any.whl (29.1 kB view details)

Uploaded Python 3

File details

Details for the file thoth_vdbmanager-0.3.1.tar.gz.

File metadata

  • Download URL: thoth_vdbmanager-0.3.1.tar.gz
  • Upload date:
  • Size: 22.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for thoth_vdbmanager-0.3.1.tar.gz
Algorithm Hash digest
SHA256 8b66e0960d05c09ab101b4c381d368e38663134b17c44e3642241c8099ccf71b
MD5 ffaad794937435a35eca3d8717059f11
BLAKE2b-256 b7888b0ebe3c0f04043a47ac80740e9a1b1a08d286e764fbaefdf11a682ba8b7

See more details on using hashes here.

File details

Details for the file thoth_vdbmanager-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for thoth_vdbmanager-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c76f82e299d0729d368d07777055901c857f256d6a4b9ad8d3ff0ff724decd4
MD5 cc6228ec052b9a64d03fced7e8b58d2c
BLAKE2b-256 ed77592aae24e25d860e11154c75d58fc42a5371e79df1e84757e1ceb8c6d707

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page