An integration package connecting ZeusDB and LangChain.
Project description
LangChain ZeusDB Integration
A high-performance LangChain integration for ZeusDB, bringing enterprise-grade vector search capabilities to your LangChain applications.
Features
🚀 High Performance
- Rust-powered vector database backend
- Advanced HNSW indexing for sub-millisecond search
- Product Quantization for 4x-256x memory compression
- Concurrent search with automatic parallelization
🎯 LangChain Native
- Full VectorStore API compliance
- Async/await support for all operations
- Seamless integration with LangChain retrievers
- Maximal Marginal Relevance (MMR) search
🏢 Enterprise Ready
- Structured logging with performance monitoring
- Index persistence with complete state preservation
- Advanced metadata filtering
- Graceful error handling and fallback mechanisms
Quick Start
Installation
pip install -qU langchain-zeusdb
Basic Usage
from langchain-zeusdb import ZeusDBVectorStore
from langchain-openai import OpenAIEmbeddings
from zeusdb import VectorDatabase
# Initialize embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Create ZeusDB index
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw",
dim=1536,
space="cosine"
)
# Create vector store
vector_store = ZeusDBVectorStore(
zeusdb_index=index,
embedding=embeddings
)
# Add documents
from langchain_core.documents import Document
docs = [
Document(page_content="ZeusDB is fast", metadata={"source": "docs"}),
Document(page_content="LangChain is powerful", metadata={"source": "docs"}),
]
vector_store.add_documents(docs)
# Search
results = vector_store.similarity_search("fast database", k=2)
print(f"Found {len(results)} results")
Factory Methods
# Create from texts
vector_store = ZeusDBVectorStore.from_texts(
texts=["Hello world", "Goodbye world"],
embedding=embeddings,
metadatas=[{"source": "text1"}, {"source": "text2"}]
)
# Create from documents
vector_store = ZeusDBVectorStore.from_documents(
documents=docs,
embedding=embeddings
)
Advanced Features
Memory-Efficient Setup with Quantization
For large datasets, use Product Quantization to reduce memory usage:
# Create quantized index for memory efficiency
quantization_config = {
'type': 'pq',
'subvectors': 8,
'bits': 8,
'training_size': 10000
}
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw",
dim=1536,
space="cosine",
quantization_config=quantization_config
)
vector_store = ZeusDBVectorStore(
zeusdb_index=index,
embedding=embeddings
)
Persistence
Save and load your vector store:
# Save index
vector_store.save_index("my_index.zdb")
# Load index
loaded_store = ZeusDBVectorStore.load_index(
path="my_index.zdb",
embedding=embeddings
)
Advanced Search Options
# Similarity search with scores
results = vector_store.similarity_search_with_score(
query="machine learning",
k=5
)
# MMR search for diversity
results = vector_store.max_marginal_relevance_search(
query="AI applications",
k=5,
fetch_k=20,
lambda_mult=0.7 # Balance relevance vs diversity
)
# Search with metadata filtering
results = vector_store.similarity_search(
query="database performance",
k=3,
filter={"source": "documentation"}
)
As a Retriever
# Convert to retriever for use in chains
retriever = vector_store.as_retriever(
search_type="mmr",
search_kwargs={"k": 3, "lambda_mult": 0.8}
)
# Use in a chain
from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(),
retriever=retriever
)
answer = qa_chain.run("What is ZeusDB?")
Async Support
All operations support async/await:
# Async operations
await vector_store.aadd_documents(documents)
results = await vector_store.asimilarity_search("query", k=5)
await vector_store.adelete(ids=["doc1", "doc2"])
Monitoring and Observability
Performance Monitoring
# Get index statistics
stats = vector_store.get_zeusdb_stats()
print(f"Index size: {stats.get('vector_count', 0)} vectors")
# Benchmark search performance
performance = vector_store.benchmark_search_performance(
query_count=100,
max_threads=4
)
print(f"Search QPS: {performance.get('parallel_qps', 0)}")
# Check quantization status
if vector_store.is_quantized():
progress = vector_store.get_training_progress()
print(f"Quantization training: {progress:.1f}% complete")
Enterprise Logging
The integration includes structured logging for production monitoring:
import logging
# Configure logging to see performance metrics
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger("langchain_zeusdb")
# Operations are automatically logged with:
# - Duration measurements
# - Error context
# - Operation metadata
Configuration Options
Index Parameters
vdb = VectorDatabase()
index = vdb.create(
index_type="hnsw", # Index algorithm
dim=1536, # Vector dimension
space="cosine", # Distance metric: cosine, l2, l1
m=16, # HNSW connectivity
ef_construction=200, # Build-time search width
expected_size=100000, # Expected number of vectors
quantization_config=None # Optional quantization
)
Search Parameters
results = vector_store.similarity_search(
query="search query",
k=5, # Number of results
ef_search=None, # Runtime search width (auto if None)
filter={"key": "value"} # Metadata filter
)
Error Handling
The integration includes comprehensive error handling:
try:
results = vector_store.similarity_search("query")
except Exception as e:
# Graceful degradation with logging
print(f"Search failed: {e}")
# Fallback logic here
Requirements
- Python: 3.10 or higher
- ZeusDB: 0.0.8 or higher
- LangChain Core: 0.3.74 or higher
Installation from Source
git clone https://github.com/zeusdb/langchain-zeusdb.git
cd langchain-zeusdb/libs/zeusdb
pip install -e .
Development
# Install with test dependencies
pip install -e ".[test]"
# Run tests
pytest tests/
# Run with coverage
pytest tests/ --cov=langchain_zeusdb
# Lint code
ruff check .
ruff format .
# Type checking
mypy .
Performance Benchmarks
In internal benchmarks, ZeusDB has demonstrated exceptional performance for large-scale vector search workloads.
| Operation | Performance | Notes |
|---|---|---|
| Index Creation | 1M+ vectors/min | Depends on vector dimension |
| Search Latency | <1ms | Sub-millisecond for most queries |
| Memory Usage | 50-90% reduction | With Product Quantization |
| Concurrent QPS | 10,000+ | Multi-threaded search |
⚠️ Note: These figures represent internal benchmark results under specific test conditions. Actual performance may vary depending on hardware, vector dimensions, dataset size, and workload characteristics.
Use Cases
- RAG Applications: High-performance retrieval for question answering
- Semantic Search: Fast similarity search across large document collections
- Recommendation Systems: Vector-based content and collaborative filtering
- Embeddings Analytics: Analysis of high-dimensional embedding spaces
- Real-time Applications: Low-latency vector search for production systems
Compatibility
LangChain Versions
- LangChain Core: 0.3.74+
- LangChain Community: Compatible with all versions
- LangSmith: Full tracing and monitoring support
Distance Metrics
- Cosine: Default, normalized similarity
- Euclidean (L2): Geometric distance
- Manhattan (L1): City-block distance
Embedding Models
Compatible with any embedding provider:
- OpenAI (
text-embedding-3-small,text-embedding-3-large) - Hugging Face Transformers
- Cohere Embeddings
- Custom embedding functions
Support
- Documentation: docs.zeusdb.com
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: contact@zeusdb.com
Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for a detailed history of changes.
Making vector search fast, scalable, and developer-friendly.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_zeusdb-0.1.0.tar.gz.
File metadata
- Download URL: langchain_zeusdb-0.1.0.tar.gz
- Upload date:
- Size: 100.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7958633554b7827516645814889344dd1f1ba4d72e3c0f73e9e52f72075a88ec
|
|
| MD5 |
36ed871b47ccd45be5929cc8127aadc2
|
|
| BLAKE2b-256 |
5c2e74dc06be7a2263151a670f805df937697673e52701a429f17cbdac353195
|
File details
Details for the file langchain_zeusdb-0.1.0-py3-none-any.whl.
File metadata
- Download URL: langchain_zeusdb-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cee6e6437546eaf840773bccc9f6552f9fdcc26b670df5f5bfd889f9894aabfe
|
|
| MD5 |
f75e9105521f62e05f3cc3642015e6d4
|
|
| BLAKE2b-256 |
966f80c8c275f8a4f8a0878d0071b5e81d2c738da98ae6c81b5b97cff53025c7
|