High Speed Vector Database for Faster and Efficient ANN Searches with LangChain
Project description
Endee LangChain Integration
This package provides an integration between Endee (a high speed vector database) and LangChain, allowing you to use Endee as a vector store backend for LangChain.
Features
- Multiple Distance Metrics: Support for cosine, L2, and inner product distance metrics
- Configurable Precision: Choose between medium (INT8, default), fp16, high (INT16), and ultra-high (FP32) precision levels for optimal performance/accuracy trade-offs
- Client-Side Encryption: Optional encryption support for secure vector storage
- Metadata Filtering: Filter search results based on metadata
- High Performance: Optimized for speed and efficiency with vector data
Installation
pip install endee-langchain
This will install both the endee-langchain package and its dependencies (endee, langchain, and langchain-core).
Quick Start
import os
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from endee.endee_client import Endee
from endee_langchain import EndeeVectorStore
# Configure your Endee credentials
api_token = os.environ.get("ENDEE_API_TOKEN")
nd = Endee(token=api_token)
# Initialize embedding model
embedding_model = OpenAIEmbeddings()
# Initialize the vector store
vector_store = EndeeVectorStore.from_params(
embedding=embedding_model,
api_token=api_token,
index_name="my_langchain_vectors",
dimension=1536,
space_type="cosine",
precision="medium" # Options: "medium", "fp16", "high", "ultra-high"
)
# Add documents
texts = [
"Endee is the world's fastest vector database",
"LangChain is a framework for developing applications powered by language models",
"Vector databases store vector embeddings and enable fast similarity search."
]
metadatas = [
{"source": "product", "category": "database"},
{"source": "github", "category": "framework"},
{"source": "textbook", "category": "security"}
]
vector_store.add_texts(texts=texts, metadatas=metadatas)
# Search similar documents
results = vector_store.similarity_search("How do vector databases work?", k=2)
# Process results
for doc in results:
print(f"Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")
print()
Client-Side Encryption
Endee supports optional client-side encryption to protect your sensitive vector data. When enabled, vectors are encrypted before being sent to the database.
Enabling Encryption
from endee.endee_client import Endee
from endee_langchain import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
# Initialize Endee client
api_token = os.environ.get("ENDEE_API_TOKEN")
nd = Endee(token=api_token)
# Generate a secure encryption key
encryption_key = nd.generate_key()
# IMPORTANT: Store this key securely! You'll need it to access your data
print(f"Encryption key: {encryption_key}")
# Save this key in a secure location (e.g., environment variable, secrets manager)
# Create an encrypted vector store
vector_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="encrypted_vectors",
dimension=1536,
space_type="cosine",
precision="medium",
encryption_key=encryption_key # Enable encryption
)
# Add encrypted documents
texts = ["Sensitive information", "Confidential data"]
vector_store.add_texts(texts=texts)
# Search works transparently with encryption
results = vector_store.similarity_search("confidential", k=2)
Accessing Existing Encrypted Index
When accessing an existing encrypted index, you must provide the same encryption key that was used to create it:
# Retrieve your stored encryption key
encryption_key = os.environ.get("ENDEE_ENCRYPTION_KEY")
# Access the encrypted vector store
vector_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="encrypted_vectors",
encryption_key=encryption_key # Must match the key used during creation
)
# Now you can search and add documents
results = vector_store.similarity_search("query", k=5)
Encryption Best Practices
-
Store keys securely: Never hardcode encryption keys in your code. Use environment variables, secrets managers (AWS Secrets Manager, Azure Key Vault, etc.), or secure key management systems.
-
Key backup: Make sure to backup your encryption key in a secure location. If you lose the key, you cannot access your encrypted data.
-
Key rotation: For enhanced security, consider implementing key rotation policies for your encrypted indexes.
-
Access control: Limit access to encryption keys to only authorized personnel and applications.
Example with Environment Variables
import os
from endee.endee_client import Endee
from endee_langchain import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings
# Load credentials from environment
api_token = os.environ.get("ENDEE_API_TOKEN")
encryption_key = os.environ.get("ENDEE_ENCRYPTION_KEY")
# If no key exists, generate and store one
if not encryption_key:
nd = Endee(token=api_token)
encryption_key = nd.generate_key()
print("Generated new encryption key. Store this securely:")
print(f"export ENDEE_ENCRYPTION_KEY={encryption_key}")
# Create encrypted vector store
vector_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="secure_index",
dimension=1536,
encryption_key=encryption_key
)
Encryption vs Non-Encryption
# Without encryption (default)
unencrypted_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="public_index",
dimension=1536
# No encryption_key parameter
)
# With encryption
encrypted_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="secure_index",
dimension=1536,
encryption_key=encryption_key # Encryption enabled
)
Note: Encryption is completely optional. If you don't provide an encryption_key, your data will be stored without encryption (which is fine for non-sensitive data).
Understanding Precision Levels
Endee supports different precision levels (quantization) that allow you to balance between memory usage, search speed, and accuracy:
| Precision | Quantization | Data Type | Memory per Vector | Search Speed | Best For |
|---|---|---|---|---|---|
medium |
8-bit | INT8 | Smallest (1x) | Fastest | Large-scale applications, millions of vectors (default) |
fp16 |
16-bit | FP16 | Small (2x) | Very Fast | Balanced performance and accuracy |
high |
16-bit | INT16 | Small (2x) | Very Fast | Production workloads |
ultra-high |
32-bit | FP32 | Large (4x) | Slower | Maximum accuracy requirements |
Memory Usage Example: For a 1536-dimensional vector:
medium(INT8): 1.5 KB per vectorfp16/high(16-bit): 3 KB per vectorultra_high(FP32): 6 KB per vector
Example: Choosing Precision Level
# For maximum speed and memory efficiency with large datasets (default)
fast_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="fast_index",
dimension=1536,
precision="medium" # 8-bit quantization (INT8) - This is the default
)
# For balanced float precision (recommended for most cases)
fp16_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="fp16_index",
dimension=1536,
precision="fp16" # 16-bit floating point
)
# For balanced integer precision
balanced_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="balanced_index",
dimension=1536,
precision="high" # 16-bit integer (INT16)
)
# For maximum accuracy
accurate_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token=api_token,
index_name="accurate_index",
dimension=1536,
precision="ultra-high" # 32-bit floating point (FP32)
)
Filtering Search Results
You can filter search results based on metadata using flexible query operators. Here's an example using a filter:
Search with a filter
query = "Tell me about Endee"
filter_dict = {"category": {"$eq": "database"}}
filtered_results = vector_store.similarity_search(
query=query,
k=3,
filter=filter_dict
)
print(f"Query: '{query}' with filter: {filter_dict}")
print(f"\nFound {len(filtered_results)} filtered results:")
for i, doc in enumerate(filtered_results):
print(f"\nResult {i+1}:")
print(f"Content: {doc.page_content}")
print(f"Metadata: {doc.metadata}")
Supported Filter Operators
-
$eq: Matches records with metadata values equal to a specified value
Example:{ "category": { "$eq": "database" } }
-
$in: Matches records with metadata values that are in a specified array
Example:{ "category": { "$in": ["database", "framework"] } }
-
$range: Matches numeric metadata fields within a given range
Format:[min, max]
Example:{ "score": { "$range": [0, 10] } }
Using with LangChain
Endee can be used anywhere a LangChain vector store is needed:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from endee_langchain import EndeeVectorStore
# Initialize your vector store
vector_store = EndeeVectorStore.from_params(
embedding=OpenAIEmbeddings(),
api_token="your_api_token",
index_name="your_index_name",
dimension=1536,
precision="medium"
)
# Create a retriever
retriever = vector_store.as_retriever()
# Create the RAG chain
model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template(
"""Answer the following question based on the provided context:
Context: {context}
Question: {question}
"""
)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| model
| StrOutputParser()
)
# Use the chain
response = rag_chain.invoke("What is Endee?")
print(response)
Creating from Documents
You can also create a vector store directly from LangChain documents:
from langchain_core.documents import Document
documents = [
Document(
page_content="Endee is the world's fastest vector database",
metadata={"source": "product", "category": "database"}
),
Document(
page_content="LangChain is a framework for developing applications",
metadata={"source": "github", "category": "framework"}
)
]
vector_store = EndeeVectorStore.from_documents(
documents=documents,
embedding=OpenAIEmbeddings(),
api_token="your_api_token",
index_name="doc_index",
dimension=1536,
precision="medium"
)
# With encryption
encrypted_vector_store = EndeeVectorStore.from_documents(
documents=documents,
embedding=OpenAIEmbeddings(),
api_token="your_api_token",
index_name="encrypted_doc_index",
dimension=1536,
precision="medium",
encryption_key=encryption_key # Add encryption
)
API Reference
EndeeVectorStore
The main class for integrating with LangChain. Key methods include:
__init__: Initialize with a Endee index or parameters to create a new onefrom_params: Create a vector store using an API tokenfrom_texts: Create a vector store from a list of textsfrom_documents: Create a vector store from LangChain documentsadd_texts: Add text documents with optional metadatasimilarity_search: Search for similar documentssimilarity_search_with_score: Search and return similarity scoresdelete: Delete documents by ID or filter
Configuration Options
The EndeeVectorStore constructor and from_params method accept the following parameters:
embedding: LangChain embedding function to useapi_token: Your Endee API tokenindex_name: Name of the Endee indexdimension: Vector dimension (required when creating a new index)space_type: Distance metric, one of "cosine", "l2", or "ip" (default: "cosine")precision: Precision level, one of "medium" (INT8, default), "fp16" (FP16), "high" (INT16), or "ultra-high" (FP32)encryption_key: Optional encryption key for client-side encryption (default: None)text_key: Key to use for storing text in metadata (default: "text")
Performance Tips
-
Choose the right precision: The default
"medium"(INT8) works well for most large-scale applications. Use"fp16"(FP16) or"high"(INT16) for better accuracy, and"ultra-high"(FP32) only when maximum accuracy is required. -
Batch operations: When adding many documents, use larger batch sizes for better performance:
vector_store.add_texts( texts=large_text_list, metadatas=metadata_list, batch_size=1000 # Adjust based on your data )
-
Use metadata filtering: Pre-filter your search space using metadata to improve both speed and relevance:
results = vector_store.similarity_search( query="your query", k=10, filter={"category": {"$eq": "relevant_category"}} )
-
Encryption considerations: Encryption adds minimal overhead to operations. Use it for sensitive data without significant performance concerns. However, ensure you have a robust key management strategy in place.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file endee_langchain-0.1.1.tar.gz.
File metadata
- Download URL: endee_langchain-0.1.1.tar.gz
- Upload date:
- Size: 13.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfa2bce17db1d17d3fb4ef95ae3a752d60f31a2204c2fa803916961d988d8c9d
|
|
| MD5 |
d2a7cf6eaa6f51e22d8afa026cdab4b1
|
|
| BLAKE2b-256 |
03f96522639c01ff600e107d4dca559a4e4108f7b4bd54a1ba9227350e48b065
|
File details
Details for the file endee_langchain-0.1.1-py3-none-any.whl.
File metadata
- Download URL: endee_langchain-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8cd49ad69cb7bad5711ad22b3eaeb8c841b82e21969ad915b6f51371f3ad599e
|
|
| MD5 |
9c8c4d652459acdd5da23ac182fe861e
|
|
| BLAKE2b-256 |
b3e03b79d03523cef26ee0c049eb86c180b292ed1e14a23d191370dc6cddb608
|