Skip to main content

An integration package connecting ZeusDB and LangChain.

Project description

LangChain ZeusDB Integration

Meta       ZeusDB 

A high-performance LangChain integration for ZeusDB, bringing enterprise-grade vector search capabilities to your LangChain applications.

Features

🚀 High Performance

  • Rust-powered vector database backend
  • Advanced HNSW indexing for sub-millisecond search
  • Product Quantization for 4x-256x memory compression
  • Concurrent search with automatic parallelization

🎯 LangChain Native

  • Full VectorStore API compliance
  • Async/await support for all operations
  • Seamless integration with LangChain retrievers
  • Maximal Marginal Relevance (MMR) search

🏢 Enterprise Ready

  • Structured logging with performance monitoring
  • Index persistence with complete state preservation
  • Advanced metadata filtering
  • Graceful error handling and fallback mechanisms

Quick Start

Installation

pip install -qU langchain-zeusdb

Basic Usage

from langchain-zeusdb import ZeusDBVectorStore
from langchain-openai import OpenAIEmbeddings
from zeusdb import VectorDatabase

# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Create ZeusDB index
vdb = VectorDatabase()
index = vdb.create(
    index_type="hnsw",
    dim=1536,
    space="cosine"
)

# Create vector store
vector_store = ZeusDBVectorStore(
    zeusdb_index=index,
    embedding=embeddings
)

# Add documents
from langchain_core.documents import Document

docs = [
    Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
    Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]

vector_store.add_documents(docs)

# Search
results = vector_store.similarity_search("fast database", k=2)
print(f"Found {len(results)} results")

Factory Methods

# Create from texts
vector_store = ZeusDBVectorStore.from_texts(
    texts=["Hello world", "Goodbye world"],
    embedding=embeddings,
    metadatas=[{"source": "text1"}, {"source": "text2"}]
)

# Create from documents
vector_store = ZeusDBVectorStore.from_documents(
    documents=docs,
    embedding=embeddings
)

Advanced Features

Memory-Efficient Setup with Quantization

For large datasets, use Product Quantization to reduce memory usage:

# Create quantized index for memory efficiency
quantization_config = {
    'type': 'pq',
    'subvectors': 8,
    'bits': 8,
    'training_size': 10000
}

vdb = VectorDatabase()
index = vdb.create(
    index_type="hnsw",
    dim=1536,
    space="cosine",
    quantization_config=quantization_config
)

vector_store = ZeusDBVectorStore(
    zeusdb_index=index,
    embedding=embeddings
)

Persistence

Save and load your vector store:

# Save index
vector_store.save_index("my_index.zdb")

# Load index
loaded_store = ZeusDBVectorStore.load_index(
    path="my_index.zdb",
    embedding=embeddings
)

Advanced Search Options

# Similarity search with scores
results = vector_store.similarity_search_with_score(
    query="machine learning",
    k=5
)

# MMR search for diversity
results = vector_store.max_marginal_relevance_search(
    query="AI applications",
    k=5,
    fetch_k=20,
    lambda_mult=0.7  # Balance relevance vs diversity
)

# Search with metadata filtering
results = vector_store.similarity_search(
    query="database performance",
    k=3,
    filter={"source": "documentation"}
)

As a Retriever

# Convert to retriever for use in chains
retriever = vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 3, "lambda_mult": 0.8}
)

# Use in a chain
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI

qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    retriever=retriever
)

answer = qa_chain.run("What is ZeusDB?")

Async Support

All operations support async/await:

# Async operations
await vector_store.aadd_documents(documents)
results = await vector_store.asimilarity_search("query", k=5)
await vector_store.adelete(ids=["doc1", "doc2"])

Monitoring and Observability

Performance Monitoring

# Get index statistics
stats = vector_store.get_zeusdb_stats()
print(f"Index size: {stats.get('vector_count', 0)} vectors")

# Benchmark search performance
performance = vector_store.benchmark_search_performance(
    query_count=100,
    max_threads=4
)
print(f"Search QPS: {performance.get('parallel_qps', 0)}")

# Check quantization status
if vector_store.is_quantized():
    progress = vector_store.get_training_progress()
    print(f"Quantization training: {progress:.1f}% complete")

Enterprise Logging

The integration includes structured logging for production monitoring:

import logging

# Configure logging to see performance metrics
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("langchain_zeusdb")

# Operations are automatically logged with:
# - Duration measurements
# - Error context
# - Operation metadata

Configuration Options

Index Parameters

vdb = VectorDatabase()
index = vdb.create(
    index_type="hnsw",           # Index algorithm
    dim=1536,                    # Vector dimension
    space="cosine",              # Distance metric: cosine, l2, l1
    m=16,                        # HNSW connectivity
    ef_construction=200,         # Build-time search width
    expected_size=100000,        # Expected number of vectors
    quantization_config=None     # Optional quantization
)

Search Parameters

results = vector_store.similarity_search(
    query="search query",
    k=5,                         # Number of results
    ef_search=None,              # Runtime search width (auto if None)
    filter={"key": "value"}      # Metadata filter
)

Error Handling

The integration includes comprehensive error handling:

try:
    results = vector_store.similarity_search("query")
except Exception as e:
    # Graceful degradation with logging
    print(f"Search failed: {e}")
    # Fallback logic here

Requirements

  • Python: 3.10 or higher
  • ZeusDB: 0.0.8 or higher
  • LangChain Core: 0.3.74 or higher

Installation from Source

git clone https://github.com/zeusdb/langchain-zeusdb.git
cd langchain-zeusdb/libs/zeusdb
pip install -e .

Development

# Install with test dependencies
pip install -e ".[test]"

# Run tests
pytest tests/

# Run with coverage
pytest tests/ --cov=langchain_zeusdb

# Lint code
ruff check .
ruff format .

# Type checking
mypy .

Performance Benchmarks

In internal benchmarks, ZeusDB has demonstrated exceptional performance for large-scale vector search workloads.

Operation Performance Notes
Index Creation 1M+ vectors/min Depends on vector dimension
Search Latency <1ms Sub-millisecond for most queries
Memory Usage 50-90% reduction With Product Quantization
Concurrent QPS 10,000+ Multi-threaded search

⚠️ Note: These figures represent internal benchmark results under specific test conditions. Actual performance may vary depending on hardware, vector dimensions, dataset size, and workload characteristics.

Use Cases

  • RAG Applications: High-performance retrieval for question answering
  • Semantic Search: Fast similarity search across large document collections
  • Recommendation Systems: Vector-based content and collaborative filtering
  • Embeddings Analytics: Analysis of high-dimensional embedding spaces
  • Real-time Applications: Low-latency vector search for production systems

Compatibility

LangChain Versions

  • LangChain Core: 0.3.74+
  • LangChain Community: Compatible with all versions
  • LangSmith: Full tracing and monitoring support

Distance Metrics

  • Cosine: Default, normalized similarity
  • Euclidean (L2): Geometric distance
  • Manhattan (L1): City-block distance

Embedding Models

Compatible with any embedding provider:

  • OpenAI (text-embedding-3-small, text-embedding-3-large)
  • Hugging Face Transformers
  • Cohere Embeddings
  • Custom embedding functions

Support

Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a detailed history of changes.


Making vector search fast, scalable, and developer-friendly.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_zeusdb-0.1.0.tar.gz (100.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_zeusdb-0.1.0-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file langchain_zeusdb-0.1.0.tar.gz.

File metadata

  • Download URL: langchain_zeusdb-0.1.0.tar.gz
  • Upload date:
  • Size: 100.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for langchain_zeusdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7958633554b7827516645814889344dd1f1ba4d72e3c0f73e9e52f72075a88ec
MD5 36ed871b47ccd45be5929cc8127aadc2
BLAKE2b-256 5c2e74dc06be7a2263151a670f805df937697673e52701a429f17cbdac353195

See more details on using hashes here.

File details

Details for the file langchain_zeusdb-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_zeusdb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cee6e6437546eaf840773bccc9f6552f9fdcc26b670df5f5bfd889f9894aabfe
MD5 f75e9105521f62e05f3cc3642015e6d4
BLAKE2b-256 966f80c8c275f8a4f8a0878d0071b5e81d2c738da98ae6c81b5b97cff53025c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page