A vector database management module for ThothAI Project
Project description
Thoth Vector Database Manager v2.0
A high-performance, Haystack-based vector database manager with support for multiple backends and local embedding capabilities.
๐ค MCP Server Support
This project is configured with MCP (Model Context Protocol) servers for enhanced AI-assisted development:
- Context7: Enhanced context management
- Serena: IDE assistance and development support
See docs/MCP_SETUP.md for details.
๐ Features
- Multi-backend support: Qdrant, Weaviate, Chroma, PostgreSQL pgvector, Milvus, Pinecone
- Haystack integration: Uses Haystack as an abstraction layer over vector stores
- Local embeddings: Uses open-source Sentence Transformers for local embedding generation
- Memory optimization: Lazy loading and efficient batch processing
- API compatibility: Maintains backward compatibility with existing ThothVectorStore API
- Type safety: Full type hints and Pydantic validation
- Flexible deployment: Multiple modes (memory, filesystem, server) for different use cases
- Production-ready: Comprehensive testing and robust error handling
๐ฆ Installation
โ ๏ธ IMPORTANT: Backend Compatibility
Weaviate and Milvus cannot be installed together due to conflicting gRPC requirements. Choose one of the following installation options:
Option 1: Milvus Configuration (Recommended)
# Basic installation with Milvus support
pip install thoth-vdbmanager[milvus]
# All backends except Weaviate (includes Milvus)
pip install thoth-vdbmanager[all]
Option 2: Weaviate Configuration
# Basic installation with Weaviate support
pip install thoth-vdbmanager[weaviate]
# All backends except Milvus (includes Weaviate)
pip install thoth-vdbmanager[all-with-weaviate]
Option 3: Safe Backends Only
# No gRPC conflicts (Qdrant, Chroma, pgvector, Pinecone)
pip install thoth-vdbmanager[all-safe]
# Individual backends
pip install thoth-vdbmanager[qdrant]
pip install thoth-vdbmanager[chroma]
pip install thoth-vdbmanager[pgvector]
pip install thoth-vdbmanager[pinecone]
๐ For detailed compatibility information, see Backend Compatibility Guide
๐๏ธ Architecture
The library is built on a clean architecture with:
- Core: Base interfaces and document types
- Adapters: Backend-specific implementations using Haystack
- Factory: Unified creation interface
- Compatibility: Legacy API support
๐ Quick Start
New API (Recommended)
from vdbmanager import VectorStoreFactory, ColumnNameDocument, SqlDocument, HintDocument
# Create a vector store
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_collection",
host="localhost",
port=6333
)
# Add documents
column_doc = ColumnNameDocument(
table_name="users",
column_name="email",
original_column_name="user_email",
column_description="User email address",
value_description="Valid email format"
)
doc_id = store.add_column_description(column_doc)
# Search documents
results = store.search_similar(
query="user email",
doc_type="column_name",
top_k=5
)
Legacy API (Backward Compatible)
from vdbmanager import ThothVectorStore
# Works exactly like before
store = ThothVectorStore(
backend="qdrant",
collection="my_collection",
host="localhost",
port=6333
)
# All existing methods work
doc_id = store.add_column_description(column_doc)
results = store.search_similar("user email", "column_name")
๐ง Configuration
Qdrant
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_collection",
host="localhost",
port=6333,
api_key="your-api-key", # Optional
embedding_dim=384, # Optional
hnsw_config={"m": 16, "ef_construct": 100}
)
Weaviate (Production-Ready with Docker)
Docker Setup (Recommended):
store = VectorStoreFactory.create(
backend="weaviate",
collection="MyCollection",
url="http://localhost:8080",
use_docker=True,
docker_compose_file="docker-compose-weaviate.yml"
)
Manual Configuration:
store = VectorStoreFactory.create(
backend="weaviate",
collection="MyCollection",
url="http://localhost:8080",
timeout=30,
skip_init_checks=False, # Set to True if gRPC issues
api_key="your-api-key" # Optional
)
๐ See Weaviate Configuration Guide for detailed setup instructions
Chroma (Multiple Modes)
Memory Mode (Recommended for Testing):
store = VectorStoreFactory.create(
backend="chroma",
collection="my_collection",
mode="memory" # Fast, isolated, no persistence
)
Filesystem Mode:
store = VectorStoreFactory.create(
backend="chroma",
collection="my_collection",
mode="filesystem",
persist_path="./chroma_db"
)
Server Mode (Production):
store = VectorStoreFactory.create(
backend="chroma",
collection="my_collection",
mode="server",
host="localhost",
port=8000
)
๐ See Chroma Configuration Guide for detailed setup instructions
PostgreSQL pgvector
store = VectorStoreFactory.create(
backend="pgvector",
collection="my_table",
connection_string="postgresql://user:pass@localhost:5432/dbname"
)
Milvus (Multiple Modes)
Lite Mode (Recommended for Testing):
store = VectorStoreFactory.create(
backend="milvus",
collection="my_collection",
mode="lite",
connection_uri="./milvus.db" # File-based storage
)
Server Mode (Production):
store = VectorStoreFactory.create(
backend="milvus",
collection="my_collection",
mode="server",
host="localhost",
port=19530
)
๐ See Milvus Configuration Guide for detailed setup instructions
Pinecone
store = VectorStoreFactory.create(
backend="pinecone",
collection="my-index",
api_key="your-api-key",
environment="us-west1-gcp-free"
)
๐ Performance Optimizations
Memory Usage
- Lazy initialization: Embedders and connections are initialized on first use
- Singleton pattern: Same configuration reuses existing instances
- Batch processing: Efficient bulk operations
Performance Tuning
# Optimize for specific use cases
store = VectorStoreFactory.create(
backend="qdrant",
collection="optimized",
embedding_model="sentence-transformers/all-MiniLM-L6-v2", # 384-dim, fast
hnsw_config={"m": 32, "ef_construct": 200} # Better search quality
)
๐งช Testing
# Run all tests
pytest
# Run specific backend tests
pytest tests/test_qdrant.py -v
# Run with coverage
pytest --cov=vdbmanager tests/
๐ Migration Guide
From v1.x to v2.x
Simple Migration
# Old code (v1.x)
from vdbmanager import QdrantHaystackStore
store = QdrantHaystackStore(
collection="my_docs",
host="localhost",
port=6333
)
# New code (v2.x) - fully compatible
from vdbmanager import QdrantHaystackStore # Still works!
# Or use new API
from vdbmanager import VectorStoreFactory
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_docs",
host="localhost",
port=6333
)
Advanced Migration
# Old code
from vdbmanager import ThothVectorStore
# New code - same interface, better internals
from vdbmanager import ThothVectorStore # Still works with warnings
# Recommended new approach
from vdbmanager import QdrantAdapter
store = QdrantAdapter(
collection="my_docs",
host="localhost",
port=6333
)
๐ API Reference
Core Classes
VectorStoreFactory
# Create store
store = VectorStoreFactory.create(backend, collection, **kwargs)
# From config
config = {"backend": "qdrant", "params": {...}}
store = VectorStoreFactory.from_config(config)
# List backends
backends = VectorStoreFactory.list_backends()
Document Types
ColumnNameDocument: Column metadataSqlDocument: SQL examplesHintDocument: General hints
Methods
add_column_description(doc): Add column metadataadd_sql(doc): Add SQL exampleadd_hint(doc): Add hintsearch_similar(query, doc_type, top_k=5, score_threshold=0.7): Semantic searchget_document(doc_id): Retrieve by IDbulk_add_documents(docs): Batch insertget_collection_info(): Get stats
๐ Troubleshooting
Common Issues
Connection Errors
# Check service availability
import requests
requests.get("http://localhost:6333") # Qdrant
Memory Issues
# Use smaller embedding model
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_collection",
embedding_model="sentence-transformers/all-MiniLM-L6-v2" # 384-dim
)
Performance Issues
# Tune HNSW parameters
store = VectorStoreFactory.create(
backend="qdrant",
collection="my_collection",
hnsw_config={"m": 16, "ef_construct": 100}
)
๐ค Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
๐ License
MIT License - see LICENSE file for details.
Directory structure
vdbmanager/ โโโ core/ # Base interfaces and document types โ โโโ base.py # Core document classes and interfaces โ โโโ init.py โโโ adapters/ # Backend-specific implementations โ โโโ haystack_adapter.py # Base Haystack adapter โ โโโ qdrant_adapter.py # Qdrant implementation โ โโโ weaviate_adapter.py # Weaviate implementation โ โโโ chroma_adapter.py # Chroma implementation โ โโโ pgvector_adapter.py # PostgreSQL pgvector โ โโโ milvus_adapter.py # Milvus implementation โ โโโ pinecone_adapter.py # Pinecone implementation โโโ factory.py # Unified creation interface โโโ compat/ # Legacy compatibility layer โ โโโ init.py โ โโโ thoth_vector_store.py โโโ init.py # Public API exports
NewAPI (reccomended)
from vdbmanager import VectorStoreFactory, ColumnNameDocument
Create any backend
store = VectorStoreFactory.create( backend="qdrant", collection="my_docs", host="localhost", port=6333 )
Use optimized methods
doc_id = store.add_column_description(column_doc) results = store.search_similar("user email", "column_name")
Old API (Fully compatible)
from vdbmanager import ThothVectorStore # Works with warnings
Existing code continues to work
store = ThothVectorStore( backend="qdrant", collection="my_docs", host="localhost", port=6333 )
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thoth_vdbmanager-0.2.24.tar.gz.
File metadata
- Download URL: thoth_vdbmanager-0.2.24.tar.gz
- Upload date:
- Size: 26.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e8cb12749afeaf70f6df9af1c246492f1bf8df3cc9fdbf26df4d3d8ec5846d1
|
|
| MD5 |
faaff8203f4c7abc59a8ff199f59c72e
|
|
| BLAKE2b-256 |
a9bafc21a29136dc7c1cc5084528635a04d3499191be8ce795932a513c0f498d
|
File details
Details for the file thoth_vdbmanager-0.2.24-py3-none-any.whl.
File metadata
- Download URL: thoth_vdbmanager-0.2.24-py3-none-any.whl
- Upload date:
- Size: 29.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a98e6323460680e8b9594fbdfcc00f49f98abc352cbcd8131d18a10ba4c7112
|
|
| MD5 |
3d1d5ce88731f0a21397d4db7356d2e6
|
|
| BLAKE2b-256 |
fe0bdf15d19a7e2e30c2d95ceba32e5c0ec6dcac9618582f43c738aad8a8d15a
|