High Speed Vector Database for Faster and Efficient ANN Searches with LangChain
Project description
Endee LangChain Integration
This package provides an integration between Endee (a high-performance vector database) and LangChain, allowing you to use Endee as a vector store backend for LangChain applications.
Features
- 🔍 Hybrid Search: Combines dense (semantic) + sparse (keyword) embeddings for superior retrieval accuracy
- SPLADE (default): Neural sparse model for highest accuracy
- BM25: Classical sparse model for speed
- Multiple Distance Metrics: Support for cosine, L2, and inner product distance metrics
- Configurable Precision: Choose between different quantization levels using the
Precisionenum for optimal performance/accuracy trade-offs - Metadata Filtering: Filter search results based on metadata using powerful query operators ($eq, $in, $range)
- Automatic Text Truncation: Smart text handling based on embedding model type
- High Performance: Optimized for speed and efficiency with the HNSW algorithm
- Batch Operations: Efficient batch processing for large-scale vector operations
- Production Ready: Comprehensive test suite (26 tests, 100% passing), examples, and documentation
Installation
pip install langchain_endee
This will install both the endee-langchain package and its dependencies (endee, langchain, and langchain-core).
For Hybrid Search (Recommended)
pip install fastembed # For sparse embeddings
📚 Documentation
- Complete Guide - Everything about EndeeVectorStore (1,700+ lines)
- Hybrid Search Guide - Dense vs Sparse vs Hybrid explained
- Examples - Complete RAG implementation & examples
- Test Suite - 26 comprehensive tests
- Summary - Project overview and quick reference
Quick Start
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
from endee import Precision
# Initialize embedding model
embedding_model = OpenAIEmbeddings()
# Initialize the vector store
vector_store = EndeeVectorStore(
embedding=embedding_model,
api_token="your-api-token", # Optional for local deployment
index_name="my_langchain_vectors",
dimension=1536,
space_type="cosine",
precision=Precision.INT8D # Use Precision enum
)
# Add documents
texts = [
"Endee is a high-performance vector database",
"LangChain is a framework for developing applications powered by language models",
"Vector databases store vector embeddings and enable fast similarity search"
]
metadatas = [
{"source": "product", "category": "database"},
{"source": "github", "category": "framework"},
{"source": "textbook", "category": "education"}
]
# Add texts to the vector store
ids = vector_store.add_texts(texts=texts, metadatas=metadatas)
# Search similar documents
results = vector_store.similarity_search("How do vector databases work?", k=2)
# Process results
for doc in results:
print(f"Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")
print()
🔥 Hybrid Search (Recommended for Production)
Hybrid search combines dense embeddings (semantic search) with sparse embeddings (keyword search) for superior retrieval accuracy. This is now the recommended approach for production RAG applications.
Why Hybrid Search?
| Search Type | Strengths | Weaknesses | Use Case |
|---|---|---|---|
| Dense Only | Semantic understanding, synonyms | May miss exact keywords | Conceptual queries |
| Sparse Only | Exact keyword matching | No semantic understanding | Keyword-based search |
| Hybrid ⭐ | Best of both worlds | Slightly slower (acceptable) | Production RAG |
Quick Start with Hybrid Search
from langchain_endee import EndeeVectorStore, FastEmbedSparse, RetrievalMode
from langchain_huggingface import HuggingFaceEmbeddings
# Dense embeddings (semantic)
dense_embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# Sparse embeddings (keyword) - SPLADE is now the default!
sparse_embeddings = FastEmbedSparse() # Uses prithivida/Splade_PP_en_v1 by default
# Create hybrid vector store
vector_store = EndeeVectorStore(
embedding=dense_embeddings,
sparse_embedding=sparse_embeddings,
retrieval_mode=RetrievalMode.HYBRID,
index_name="hybrid_index",
dimension=384,
api_token=None # Local deployment
)
# Use normally - hybrid search is automatic!
texts = ["Python is a programming language", "Machine learning uses neural networks"]
vector_store.add_texts(texts)
results = vector_store.similarity_search("programming with Python", k=2)
# Returns results using both semantic AND keyword matching!
Sparse Embedding Models
Default: SPLADE (Highest Accuracy) ⭐
from langchain_endee import FastEmbedSparse
# SPLADE - Neural sparse model (default)
sparse = FastEmbedSparse() # prithivida/Splade_PP_en_v1
# or explicitly:
sparse = FastEmbedSparse(model_name="prithivida/Splade_PP_en_v1", batch_size=128)
Alternative: BM25 (Faster)
# BM25 - Classical sparse model (faster, slightly less accurate)
sparse = FastEmbedSparse(model_name="Qdrant/bm25", batch_size=256)
When to Use What?
| Scenario | Recommendation |
|---|---|
| Production RAG | Hybrid with SPLADE (default) ⭐ |
| Speed Critical | Hybrid with BM25 or Dense-only |
| Maximum Accuracy | Hybrid with SPLADE + FLOAT16 precision |
| Research/Prototyping | Dense-only (simpler setup) |
| Large Scale (>1M docs) | Hybrid with BM25 + INT8D precision |
Learn more: See Hybrid Search Guide for detailed comparisons and best practices.
Understanding Precision Levels
Endee supports different precision levels (quantization) that allow you to balance between memory usage, search speed, and accuracy. Use the Precision enum from the endee package for type safety:
from endee import Precision
# Available precision levels
Precision.BINARY2 # 1-bit binary quantization
Precision.INT8D # 8-bit integer quantization (default)
Precision.INT16D # 16-bit integer quantization
Precision.FLOAT16 # 16-bit floating point
Precision.FLOAT32 # 32-bit floating point
| Precision | Quantization | Data Type | Memory per Vector | Search Speed | Best For |
|---|---|---|---|---|---|
Precision.BINARY2 |
1-bit | Binary | Smallest (~96.9% less) | Fastest | Extreme compression, large-scale deployments |
Precision.INT8D |
8-bit | INT8 | Small (~75% less) | Very Fast | Default - great for most use cases |
Precision.INT16D |
16-bit | INT16 | Medium (~50% less) | Fast | Balanced integer precision |
Precision.FLOAT16 |
16-bit | FP16 | Medium (~50% less) | Fast | Balanced float precision |
Precision.FLOAT32 |
32-bit | FP32 | Largest (baseline) | Slower | Maximum accuracy requirements |
Memory Usage Example: For a 1536-dimensional vector:
Precision.BINARY2: ~0.2 KB per vector (extreme compression)Precision.INT8D: ~1.5 KB per vector (default)Precision.INT16D/Precision.FLOAT16: ~3 KB per vectorPrecision.FLOAT32: ~6 KB per vector
Example: Choosing Precision Level
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
from endee import Precision
# Default precision - balanced performance (recommended for most cases)
default_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="default_index",
dimension=1536,
precision=Precision.INT8D # Default - 8-bit integer quantization
)
# High accuracy with 16-bit precision
high_accuracy_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="high_accuracy_index",
dimension=1536,
precision=Precision.FLOAT16 # 16-bit floating point
)
# Maximum accuracy with full 32-bit precision
max_accuracy_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="max_accuracy_index",
dimension=1536,
precision=Precision.FLOAT32 # 32-bit floating point
)
# Extreme compression for very large datasets
compressed_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="compressed_index",
dimension=1536,
precision=Precision.BINARY2 # 1-bit binary quantization
)
Local Deployment
Endee can be run locally without requiring an API token. If you have a local Endee server running on http://127.0.0.1:8080, you can initialize the vector store without an API token:
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
# Initialize without API token for local deployment
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token=None, # No token needed for local deployment
index_name="local_index",
dimension=1536
)
Creating Vector Stores
From Texts
Create a vector store directly from a list of texts:
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
from endee import Precision
texts = ["foo", "bar", "baz"]
metadatas = [{"key": "val1"}, {"key": "val2"}, {"key": "val3"}]
vector_store = EndeeVectorStore.from_texts(
texts=texts,
embedding=OpenAIEmbeddings(),
metadatas=metadatas,
api_token="your-api-token",
index_name="my-index",
dimension=1536,
space_type="cosine",
precision=Precision.INT8D
)
From Documents
Create a vector store from LangChain documents:
from langchain_core.documents import Document
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
from endee import Precision
documents = [
Document(
page_content="Endee is a high-performance vector database",
metadata={"source": "product", "category": "database"}
),
Document(
page_content="LangChain is a framework for developing applications",
metadata={"source": "github", "category": "framework"}
)
]
vector_store = EndeeVectorStore.from_documents(
documents=documents,
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="doc-index",
dimension=1536,
precision=Precision.INT8D
)
From Existing Index
Connect to an existing Endee index:
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
vector_store = EndeeVectorStore.from_existing_index(
index_name="existing-index",
embedding=OpenAIEmbeddings(),
api_token="your-api-token"
)
Filtering Search Results
You can filter search results based on metadata using flexible query operators:
# Search with a filter
query = "Tell me about Endee"
filter_dict = [{"category": {"$eq": "database"}}]
filtered_results = vector_store.similarity_search(
query=query,
k=3,
filter=filter_dict
)
print(f"Query: '{query}' with filter: {filter_dict}")
print(f"\nFound {len(filtered_results)} filtered results:")
for i, doc in enumerate(filtered_results):
print(f"\nResult {i+1}:")
print(f"Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")
Supported Filter Operators
-
$eq: Matches records with metadata values equal to a specified value{"category": {"$eq": "database"}}
-
$in: Matches records with metadata values that are in a specified array{"category": {"$in": ["database", "framework"]}}
-
$range: Matches numeric metadata fields within a given range [min, max]{"score": {"$range": [70, 95]}}
Multiple Filters (AND Logic)
Multiple filter conditions are combined with logical AND:
# Both conditions must be true
filter_dict = [
{"category": {"$eq": "database"}},
{"difficulty": {"$in": ["intermediate", "advanced"]}}
]
results = vector_store.similarity_search(
query="vector databases",
k=5,
filter=filter_dict
)
Advanced Search Operations
Similarity Search with Scores
Get similarity scores along with documents:
results = vector_store.similarity_search_with_score(
query="machine learning",
k=3
)
for doc, score in results:
print(f"Score: {score:.4f}")
print(f"Content: {doc.page_content}")
print()
Search by Vector
Search using a pre-computed embedding vector:
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
query_vector = embeddings.embed_query("What is a vector database?")
results = vector_store.similarity_search_by_vector(
embedding=query_vector,
k=5
)
Search by Vector with Scores
results = vector_store.similarity_search_by_vector_with_score(
embedding=query_vector,
k=5,
filter=[{"category": {"$eq": "database"}}]
)
for doc, score in results:
print(f"Score: {score:.4f} - {doc.page_content}")
Custom Search Parameters
Adjust the ef parameter for search quality:
# Higher ef = better recall but slower search
results = vector_store.similarity_search(
query="vector search",
k=10,
ef=256 # Default is 128, max is 1024
)
Using with LangChain
Endee can be used anywhere a LangChain vector store is needed:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_endee import EndeeVectorStore
from endee import Precision
# Initialize your vector store
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="rag-index",
dimension=1536,
precision=Precision.INT8D
)
# Create a retriever
retriever = vector_store.as_retriever(
search_kwargs={"k": 3}
)
# Create the RAG chain
model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template(
"""Answer the following question based on the provided context:
Context: {context}
Question: {question}
"""
)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
# Use the chain
response = rag_chain.invoke("What is Endee?")
print(response)
Retriever with Filters
# Create retriever with metadata filters
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={
"k": 5,
"filter": [{"category": {"$eq": "database"}}]
}
)
results = retriever.invoke("vector databases")
Document Management
Adding Documents
from langchain_core.documents import Document
documents = [
Document(page_content="text 1", metadata={"source": "doc1"}),
Document(page_content="text 2", metadata={"source": "doc2"})
]
# Add documents and get their IDs
ids = vector_store.add_documents(documents)
Deleting Documents
Delete by IDs:
# Delete specific documents by ID
vector_store.delete(ids=["id1", "id2", "id3"])
Delete by filter:
# Delete all documents matching a filter
vector_store.delete(filter=[{"status": {"$eq": "expired"}}])
Retrieving Documents by ID
# Get specific documents by their IDs
docs = vector_store.get_by_ids(["id1", "id2"])
for doc in docs:
print(doc.page_content)
print(doc.metadata)
Automatic Text Truncation
The vector store automatically detects your embedding model type and truncates text to fit within token limits:
from langchain_openai import OpenAIEmbeddings
from langchain_endee import EndeeVectorStore
# Auto-detects OpenAI embeddings (8191 token limit)
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="auto-truncate",
dimension=1536
)
# Or set custom limit
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="custom-truncate",
dimension=1536,
max_text_length=1000 # Custom token limit
)
Supported embedding models:
- OpenAI: 8191 tokens
- Cohere: 512 tokens
- HuggingFace: 512 tokens
- Default: 512 tokens
Configuration Options
EndeeVectorStore Constructor Parameters
embedding(required): LangChain embedding functionapi_token: Endee API token (optional for local deployment)index_name(required): Name of the Endee indexdimension: Vector dimension (required when creating new index)space_type: Distance metric -"cosine"(default),"l2", or"ip"precision: Precision level usingPrecisionenum -Precision.INT8D(default),Precision.BINARY2,Precision.INT16D,Precision.FLOAT16, orPrecision.FLOAT32M: HNSW graph connectivity parameter (default: 16)ef_con: HNSW construction parameter (default: 128)max_text_length: Maximum text length in tokens (auto-detected if not provided)embedding_model_type: Type of embedding model -"openai","cohere","huggingface", or"default"(auto-detected if not provided)force_recreate: Delete and recreate index if it exists (default: False)validate_index_config: Validate index configuration on initialization (default: True)content_payload_key: Key for storing text content (default: "text")metadata_payload_key: Key for storing metadata (default: "metadata")
Example with All Options
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
from endee import Precision
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="full-config-index",
dimension=1536,
space_type="cosine",
precision=Precision.INT8D,
M=16,
ef_con=128,
max_text_length=8191,
embedding_model_type="openai",
force_recreate=False,
validate_index_config=True,
content_payload_key="text",
metadata_payload_key="metadata"
)
Performance Tips
1. Choose the Right Precision
Precision.INT8D: Default - excellent balance of speed, memory, and accuracy for most use casesPrecision.FLOAT16/Precision.INT16D: Better accuracy with moderate memory increasePrecision.FLOAT32: Maximum accuracy but highest memory usagePrecision.BINARY2: Extreme compression for very large datasets where lower accuracy is acceptable
2. Batch Operations
Use larger batch sizes for better performance when adding many documents:
# Add texts in batches
ids = vector_store.add_texts(
texts=large_text_list,
metadatas=metadata_list,
batch_size=1000, # Endee batch size (max 1000)
embedding_chunk_size=100 # Embedding generation batch size
)
3. Use Metadata Filtering
Pre-filter your search space using metadata to improve both speed and relevance:
results = vector_store.similarity_search(
query="your query",
k=10,
filter=[{"category": {"$eq": "relevant_category"}}]
)
4. Tune Search Parameters
Adjust ef parameter based on your accuracy/speed requirements:
# Faster but potentially lower recall
results = vector_store.similarity_search(query="test", k=10, ef=64)
# Slower but potentially higher recall
results = vector_store.similarity_search(query="test", k=10, ef=256)
5. Index Management
Use force_recreate when you need a clean slate:
# Recreate index with new configuration
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="my-index",
dimension=1536,
force_recreate=True # Delete existing index and create new one
)
API Reference
Class Methods
__init__(...): Initialize with Endee index or parameters to create a new onefrom_texts(...): Create a vector store from a list of textsfrom_documents(...): Create a vector store from LangChain documentsfrom_existing_index(...): Connect to an existing Endee index
Instance Methods
add_texts(...): Add text documents with optional metadataadd_documents(...): Add LangChain Document objectssimilarity_search(...): Search for similar documentssimilarity_search_with_score(...): Search and return similarity scoressimilarity_search_by_vector(...): Search using an embedding vectorsimilarity_search_by_vector_with_score(...): Search by vector with scoresdelete(...): Delete documents by ID or filterget_by_ids(...): Retrieve documents by their IDsas_retriever(...): Create a LangChain retriever from the vector store
Properties
embeddings: Get the embeddings instance being usedclient: Get the Endee client instanceindex: Get the Endee index instance
Examples
Example 1: RAG System with Filters
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from endee import Precision
# Create vector store
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="docs",
dimension=1536,
precision=Precision.INT8D
)
# Create retriever with category filter
retriever = vector_store.as_retriever(
search_kwargs={
"k": 5,
"filter": [{"category": {"$eq": "technical"}}]
}
)
# Build RAG chain
prompt = ChatPromptTemplate.from_template(
"Answer based on context:\n\nContext: {context}\n\nQuestion: {question}"
)
chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| ChatOpenAI()
| StrOutputParser()
)
# Use the chain
answer = chain.invoke("How does vector search work?")
print(answer)
Example 2: Document Management
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from endee import Precision
vector_store = EndeeVectorStore(
embedding=OpenAIEmbeddings(),
api_token="your-api-token",
index_name="documents",
dimension=1536,
precision=Precision.INT8D
)
# Add documents
docs = [
Document(page_content="AI is transforming industries", metadata={"category": "ai"}),
Document(page_content="Python is a popular programming language", metadata={"category": "programming"})
]
ids = vector_store.add_documents(docs)
print(f"Added documents with IDs: {ids}")
# Search with filter
results = vector_store.similarity_search(
"programming languages",
k=5,
filter=[{"category": {"$eq": "programming"}}]
)
# Delete by filter
vector_store.delete(filter=[{"category": {"$eq": "outdated"}}])
Example 3: Multiple Precision Levels
from langchain_endee import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
from endee import Precision
embeddings = OpenAIEmbeddings()
# Fast, memory-efficient index
fast_store = EndeeVectorStore(
embedding=embeddings,
api_token="your-api-token",
index_name="fast-index",
dimension=1536,
precision=Precision.INT8D
)
# High accuracy index
accurate_store = EndeeVectorStore(
embedding=embeddings,
api_token="your-api-token",
index_name="accurate-index",
dimension=1536,
precision=Precision.FLOAT32
)
# Extreme compression index
compressed_store = EndeeVectorStore(
embedding=embeddings,
api_token="your-api-token",
index_name="compressed-index",
dimension=1536,
precision=Precision.BINARY2
)
Updated Features
Precision Parameter: Now uses the Precision enum instead of strings:
from endee import Precision
# NEW (Precision enum)
precision=Precision.INT8D # ✅ Recommended
precision=Precision.FLOAT16 # ✅ Recommended
precision=Precision.INT16D # ✅ Recommended
precision=Precision.FLOAT32 # ✅ Recommended
precision=Precision.BINARY2 # ✅ Recommended
Method Name: from_params() is now the regular constructor:
# OLD
vector_store = EndeeVectorStore.from_params(...) # ❌ Method removed
# NEW
vector_store = EndeeVectorStore(...) # ✅ Use constructor
Troubleshooting
Common Issues
1. "Index not found" error
- Ensure the index name is correct
- Check that the index exists using
client.list_indexes() - If creating a new index, ensure
dimensionparameter is provided
2. Dimension mismatch error
- Verify that the
dimensionparameter matches your embedding model's output - Common dimensions: OpenAI (1536), Cohere (1024), sentence-transformers (384, 768)
3. Local deployment not working
- Ensure Endee server is running on
http://127.0.0.1:8080 - Check server health endpoint
- Set
api_token=Noneexplicitly for local deployment
4. Text truncation warnings
- Text is automatically truncated to fit embedding model limits
- Adjust
max_text_lengthparameter if needed - Consider chunking very long documents before adding them
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_endee-0.1.0b1.tar.gz.
File metadata
- Download URL: langchain_endee-0.1.0b1.tar.gz
- Upload date:
- Size: 43.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65508f3784c5cca895bb4b9f32ca33e85dc8c9d284b8da9082198ab8b2361ef4
|
|
| MD5 |
8e00158bb904f83fdbd79f8cec78f761
|
|
| BLAKE2b-256 |
caffcab30698561b06bb247f7bff2232f0f8b430c09679551d241748e7c1da22
|
File details
Details for the file langchain_endee-0.1.0b1-py3-none-any.whl.
File metadata
- Download URL: langchain_endee-0.1.0b1-py3-none-any.whl
- Upload date:
- Size: 21.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74aacde5a8195ff6744b6cb4eebd560a14892858da3f0fe4be7e1bb53162b753
|
|
| MD5 |
ddc4a5d7898c97bb0f4abd41c7660d7c
|
|
| BLAKE2b-256 |
5f9d405b61d9b0b473471ade839a12d8a8983a93d0f721c50046a2537b62c4fd
|