Skip to main content

Native Qdrant implementation for ThothAI Vector Database

Project description

ThothAI Qdrant

A native Qdrant implementation for the ThothAI Vector Database system, providing high-performance vector storage and similarity search capabilities without Haystack dependencies.

Features

  • Native Qdrant Integration: Direct use of Qdrant client without Haystack
  • Full API Compatibility: Same interface as thoth_vdb2 for seamless integration
  • External Embeddings: Support for OpenAI, Cohere, Mistral, and HuggingFace
  • Document Types: EvidenceDocument, SqlDocument, ColumnNameDocument
  • Similarity Search: Native Qdrant search with document type filtering
  • Batch Operations: Efficient bulk document insertion
  • Caching: Intelligent embedding cache for performance

Installation

# Basic installation
pip install thoth-qdrant

# With OpenAI embeddings support
pip install thoth-qdrant[openai]

# With all embedding providers
pip install thoth-qdrant[all-providers]

Configuration

Environment Variables (v0.1.10+)

Starting from version 0.1.10, embedding configuration is managed exclusively through environment variables:

# Required: Embedding provider configuration
export EMBEDDING_PROVIDER=openai
export EMBEDDING_MODEL=text-embedding-3-small
export OPENAI_API_KEY=your-api-key

# Optional: Advanced configuration
export EMBEDDING_BASE_URL=https://api.openai.com/v1  # Custom API endpoint
export EMBEDDING_BATCH_SIZE=100  # Batch size for embedding requests
export EMBEDDING_TIMEOUT=30  # Request timeout in seconds

# Provider-specific API keys
export OPENAI_API_KEY=sk-...
export COHERE_API_KEY=...
export MISTRAL_API_KEY=...

Qdrant Setup

Ensure Qdrant is running locally:

docker run -p 6333:6333 qdrant/qdrant

Usage

from thoth_qdrant import VectorStoreFactory
from thoth_qdrant.core.base import (
    ColumnNameDocument,
    SqlDocument,
    EvidenceDocument,
    ThothType,
)

# Create vector store
store = VectorStoreFactory.create(
    backend="qdrant",
    collection="my_collection",
    host="localhost",
    port=6333,
    embedding_provider="openai",
    embedding_model="text-embedding-3-small"
)

# Add documents
column_doc = ColumnNameDocument(
    table_name="users",
    column_name="email",
    original_column_name="email_address",
    column_description="User email for authentication",
    value_description="Valid email format"
)
doc_id = store.add_column_description(column_doc)

sql_doc = SqlDocument(
    question="How to find recent users?",
    sql="SELECT * FROM users WHERE created_at > NOW() - INTERVAL '30 days'",
    evidence="Filter by date using interval"
)
store.add_sql(sql_doc)

# Search similar documents
results = store.search_similar(
    query="user email authentication",
    doc_type=ThothType.COLUMN_NAME,
    top_k=5,
    score_threshold=0.7
)

# Bulk operations
documents = [column_doc, sql_doc]
doc_ids = store.bulk_add_documents(documents)

# Get document by ID
doc = store.get_document(doc_id)

# Delete document
store.delete_document(doc_id)

# Get all documents by type
all_columns = store.get_all_column_documents()
all_sql = store.get_all_sql_documents()

# Collection info
info = store.get_collection_info()
print(info)

# Get embedding configuration (v0.1.10+)
embedding_config = store.get_embedding_config()
print(f"Provider: {embedding_config['provider']}")
print(f"Model: {embedding_config['model']}")
print(f"Dimensions: {embedding_config['dimensions']}")
print(f"Cache enabled: {embedding_config['cache_enabled']}")

API Reference

VectorStoreInterface Methods

  • add_column_description(doc: ColumnNameDocument) -> str
  • add_sql(doc: SqlDocument) -> str
  • add_evidence(doc: EvidenceDocument) -> str
  • search_similar(query: str, doc_type: ThothType, top_k: int = 5, score_threshold: float = 0.7) -> List[BaseThothDocument]
  • get_document(doc_id: str) -> Optional[BaseThothDocument]
  • delete_document(doc_id: str) -> None
  • bulk_add_documents(documents: List[BaseThothDocument]) -> List[str]
  • delete_collection(thoth_type: ThothType) -> None
  • get_all_column_documents() -> List[ColumnNameDocument]
  • get_all_sql_documents() -> List[SqlDocument]
  • get_all_evidence_documents() -> List[EvidenceDocument]
  • get_collection_info() -> Dict[str, Any]
  • get_embedding_config() -> Dict[str, Any] (v0.1.10+)

Testing

# Run tests with local Qdrant
pytest tests/

# Run specific test
pytest tests/test_qdrant_adapter.py -v

# With coverage
pytest --cov=thoth_qdrant tests/

Development

# Install development dependencies
pip install -e .[dev,test]

# Format code
black thoth_qdrant tests
isort thoth_qdrant tests

# Type checking
mypy thoth_qdrant

# Linting
ruff thoth_qdrant

License

Apache License 2.0 - See LICENSE.md for details

Compatibility

This library is fully compatible with thoth_vdb2 API, allowing seamless migration from Haystack-based implementations to native Qdrant.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

thoth_qdrant-0.1.11.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

thoth_qdrant-0.1.11-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file thoth_qdrant-0.1.11.tar.gz.

File metadata

  • Download URL: thoth_qdrant-0.1.11.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for thoth_qdrant-0.1.11.tar.gz
Algorithm Hash digest
SHA256 d5e252a780de2e04ec849cc32c00995c98a15d5afbf3f9adb3c3bc1acd7d9f69
MD5 75ffbabdcf9db953695e46d94d89a019
BLAKE2b-256 c14f6f93470b0827b4e948b98bfa67de324ab3fa4c5afee6da90f59f01c7894b

See more details on using hashes here.

File details

Details for the file thoth_qdrant-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: thoth_qdrant-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for thoth_qdrant-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 20265bbf74a5d1ccec5beeed2acbfd68c4c6d604737e888fc1166205b089e9f2
MD5 01098871b6b1989745491bebf3560413
BLAKE2b-256 0c6bdc33d5e4a276e63881b3f9a6d4cc00d6df326966bcb316b64ba7b390b32e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page