Skip to main content

High Speed Vector Database for Faster and Efficient ANN Searches with LangChain

Project description

Endee LangChain Integration

This package provides an integration between Endee (a high speed vector database) and LangChain, allowing you to use Endee as a vector store backend for LangChain.

Features

  • Multiple Distance Metrics: Support for cosine, L2, and inner product distance metrics
  • Configurable Precision: Choose between medium (INT8, default), fp16, high (INT16), and ultra-high (FP32) precision levels for optimal performance/accuracy trade-offs
  • Client-Side Encryption: Optional encryption support for secure vector storage
  • Metadata Filtering: Filter search results based on metadata
  • High Performance: Optimized for speed and efficiency with vector data

Installation

pip install endee-langchain

This will install both the endee-langchain package and its dependencies (endee, langchain, and langchain-core).

Quick Start

import os
from langchain_openai import OpenAIEmbeddings
from langchain_core.documents import Document
from endee.endee_client import Endee
from endee_langchain import EndeeVectorStore

# Configure your Endee credentials
api_token = os.environ.get("ENDEE_API_TOKEN")
nd = Endee(token=api_token)

# Initialize embedding model
embedding_model = OpenAIEmbeddings()

# Initialize the vector store
vector_store = EndeeVectorStore.from_params(
    embedding=embedding_model,
    api_token=api_token,
    index_name="my_langchain_vectors",
    dimension=1536,
    space_type="cosine",
    precision="medium"  # Options: "medium", "fp16", "high", "ultra-high"
)

# Add documents
texts = [
    "Endee is the world's fastest vector database",
    "LangChain is a framework for developing applications powered by language models",
    "Vector databases store vector embeddings and enable fast similarity search."
]

metadatas = [
    {"source": "product", "category": "database"},
    {"source": "github", "category": "framework"},
    {"source": "textbook", "category": "security"}
]

vector_store.add_texts(texts=texts, metadatas=metadatas)

# Search similar documents
results = vector_store.similarity_search("How do vector databases work?", k=2)

# Process results
for doc in results:
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")
    print()

Client-Side Encryption

Endee supports optional client-side encryption to protect your sensitive vector data. When enabled, vectors are encrypted before being sent to the database.

Enabling Encryption

from endee.endee_client import Endee
from endee_langchain import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize Endee client
api_token = os.environ.get("ENDEE_API_TOKEN")
nd = Endee(token=api_token)

# Generate a secure encryption key
encryption_key = nd.generate_key()

# IMPORTANT: Store this key securely! You'll need it to access your data
print(f"Encryption key: {encryption_key}")
# Save this key in a secure location (e.g., environment variable, secrets manager)

# Create an encrypted vector store
vector_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="encrypted_vectors",
    dimension=1536,
    space_type="cosine",
    precision="medium",
    encryption_key=encryption_key  # Enable encryption
)

# Add encrypted documents
texts = ["Sensitive information", "Confidential data"]
vector_store.add_texts(texts=texts)

# Search works transparently with encryption
results = vector_store.similarity_search("confidential", k=2)

Accessing Existing Encrypted Index

When accessing an existing encrypted index, you must provide the same encryption key that was used to create it:

# Retrieve your stored encryption key
encryption_key = os.environ.get("ENDEE_ENCRYPTION_KEY")

# Access the encrypted vector store
vector_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="encrypted_vectors",
    encryption_key=encryption_key  # Must match the key used during creation
)

# Now you can search and add documents
results = vector_store.similarity_search("query", k=5)

Encryption Best Practices

  1. Store keys securely: Never hardcode encryption keys in your code. Use environment variables, secrets managers (AWS Secrets Manager, Azure Key Vault, etc.), or secure key management systems.

  2. Key backup: Make sure to backup your encryption key in a secure location. If you lose the key, you cannot access your encrypted data.

  3. Key rotation: For enhanced security, consider implementing key rotation policies for your encrypted indexes.

  4. Access control: Limit access to encryption keys to only authorized personnel and applications.

Example with Environment Variables

import os
from endee.endee_client import Endee
from endee_langchain import EndeeVectorStore
from langchain_openai import OpenAIEmbeddings

# Load credentials from environment
api_token = os.environ.get("ENDEE_API_TOKEN")
encryption_key = os.environ.get("ENDEE_ENCRYPTION_KEY")

# If no key exists, generate and store one
if not encryption_key:
    nd = Endee(token=api_token)
    encryption_key = nd.generate_key()
    print("Generated new encryption key. Store this securely:")
    print(f"export ENDEE_ENCRYPTION_KEY={encryption_key}")

# Create encrypted vector store
vector_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="secure_index",
    dimension=1536,
    encryption_key=encryption_key
)

Encryption vs Non-Encryption

# Without encryption (default)
unencrypted_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="public_index",
    dimension=1536
    # No encryption_key parameter
)

# With encryption
encrypted_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="secure_index",
    dimension=1536,
    encryption_key=encryption_key  # Encryption enabled
)

Note: Encryption is completely optional. If you don't provide an encryption_key, your data will be stored without encryption (which is fine for non-sensitive data).

Understanding Precision Levels

Endee supports different precision levels (quantization) that allow you to balance between memory usage, search speed, and accuracy:

Precision Quantization Data Type Memory per Vector Search Speed Best For
medium 8-bit INT8 Smallest (1x) Fastest Large-scale applications, millions of vectors (default)
fp16 16-bit FP16 Small (2x) Very Fast Balanced performance and accuracy
high 16-bit INT16 Small (2x) Very Fast Production workloads
ultra-high 32-bit FP32 Large (4x) Slower Maximum accuracy requirements

Memory Usage Example: For a 1536-dimensional vector:

  • medium (INT8): 1.5 KB per vector
  • fp16 / high (16-bit): 3 KB per vector
  • ultra_high (FP32): 6 KB per vector

Example: Choosing Precision Level

# For maximum speed and memory efficiency with large datasets (default)
fast_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="fast_index",
    dimension=1536,
    precision="medium"  # 8-bit quantization (INT8) - This is the default
)

# For balanced float precision (recommended for most cases)
fp16_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="fp16_index",
    dimension=1536,
    precision="fp16"  # 16-bit floating point
)

# For balanced integer precision
balanced_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="balanced_index",
    dimension=1536,
    precision="high"  # 16-bit integer (INT16)
)

# For maximum accuracy
accurate_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token=api_token,
    index_name="accurate_index",
    dimension=1536,
    precision="ultra-high"  # 32-bit floating point (FP32)
)

Filtering Search Results

You can filter search results based on metadata using flexible query operators. Here's an example using a filter:

Search with a filter

query = "Tell me about Endee"
filter_dict = {"category": {"$eq": "database"}}
 
filtered_results = vector_store.similarity_search(
    query=query,
    k=3,
    filter=filter_dict
)

print(f"Query: '{query}' with filter: {filter_dict}")
print(f"\nFound {len(filtered_results)} filtered results:")
for i, doc in enumerate(filtered_results):
    print(f"\nResult {i+1}:")
    print(f"Content: {doc.page_content}")
    print(f"Metadata: {doc.metadata}")

Supported Filter Operators

  • $eq: Matches records with metadata values equal to a specified value
    Example:

    {
      "category": { "$eq": "database" }
    }
    
  • $in: Matches records with metadata values that are in a specified array
    Example:

    {
      "category": { "$in": ["database", "framework"] }
    }
    
  • $range: Matches numeric metadata fields within a given range
    Format: [min, max]
    Example:

    {
      "score": { "$range": [0, 10] }
    }
    

Using with LangChain

Endee can be used anywhere a LangChain vector store is needed:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from endee_langchain import EndeeVectorStore

# Initialize your vector store
vector_store = EndeeVectorStore.from_params(
    embedding=OpenAIEmbeddings(),
    api_token="your_api_token",
    index_name="your_index_name",
    dimension=1536,
    precision="medium"
)

# Create a retriever
retriever = vector_store.as_retriever()

# Create the RAG chain
model = ChatOpenAI()
prompt = ChatPromptTemplate.from_template(
    """Answer the following question based on the provided context:
    
    Context: {context}
    Question: {question}
    """
)

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | model
    | StrOutputParser()
)

# Use the chain
response = rag_chain.invoke("What is Endee?")
print(response)

Creating from Documents

You can also create a vector store directly from LangChain documents:

from langchain_core.documents import Document

documents = [
    Document(
        page_content="Endee is the world's fastest vector database",
        metadata={"source": "product", "category": "database"}
    ),
    Document(
        page_content="LangChain is a framework for developing applications",
        metadata={"source": "github", "category": "framework"}
    )
]

vector_store = EndeeVectorStore.from_documents(
    documents=documents,
    embedding=OpenAIEmbeddings(),
    api_token="your_api_token",
    index_name="doc_index",
    dimension=1536,
    precision="medium"
)

# With encryption
encrypted_vector_store = EndeeVectorStore.from_documents(
    documents=documents,
    embedding=OpenAIEmbeddings(),
    api_token="your_api_token",
    index_name="encrypted_doc_index",
    dimension=1536,
    precision="medium",
    encryption_key=encryption_key  # Add encryption
)

API Reference

EndeeVectorStore

The main class for integrating with LangChain. Key methods include:

  • __init__: Initialize with a Endee index or parameters to create a new one
  • from_params: Create a vector store using an API token
  • from_texts: Create a vector store from a list of texts
  • from_documents: Create a vector store from LangChain documents
  • add_texts: Add text documents with optional metadata
  • similarity_search: Search for similar documents
  • similarity_search_with_score: Search and return similarity scores
  • delete: Delete documents by ID or filter

Configuration Options

The EndeeVectorStore constructor and from_params method accept the following parameters:

  • embedding: LangChain embedding function to use
  • api_token: Your Endee API token
  • index_name: Name of the Endee index
  • dimension: Vector dimension (required when creating a new index)
  • space_type: Distance metric, one of "cosine", "l2", or "ip" (default: "cosine")
  • precision: Precision level, one of "medium" (INT8, default), "fp16" (FP16), "high" (INT16), or "ultra-high" (FP32)
  • encryption_key: Optional encryption key for client-side encryption (default: None)
  • text_key: Key to use for storing text in metadata (default: "text")

Performance Tips

  1. Choose the right precision: The default "medium" (INT8) works well for most large-scale applications. Use "fp16" (FP16) or "high" (INT16) for better accuracy, and "ultra-high" (FP32) only when maximum accuracy is required.

  2. Batch operations: When adding many documents, use larger batch sizes for better performance:

    vector_store.add_texts(
        texts=large_text_list,
        metadatas=metadata_list,
        batch_size=1000  # Adjust based on your data
    )
    
  3. Use metadata filtering: Pre-filter your search space using metadata to improve both speed and relevance:

    results = vector_store.similarity_search(
        query="your query",
        k=10,
        filter={"category": {"$eq": "relevant_category"}}
    )
    
  4. Encryption considerations: Encryption adds minimal overhead to operations. Use it for sensitive data without significant performance concerns. However, ensure you have a robust key management strategy in place.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

endee_langchain-0.1.1.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

endee_langchain-0.1.1-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file endee_langchain-0.1.1.tar.gz.

File metadata

  • Download URL: endee_langchain-0.1.1.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for endee_langchain-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bfa2bce17db1d17d3fb4ef95ae3a752d60f31a2204c2fa803916961d988d8c9d
MD5 d2a7cf6eaa6f51e22d8afa026cdab4b1
BLAKE2b-256 03f96522639c01ff600e107d4dca559a4e4108f7b4bd54a1ba9227350e48b065

See more details on using hashes here.

File details

Details for the file endee_langchain-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for endee_langchain-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8cd49ad69cb7bad5711ad22b3eaeb8c841b82e21969ad915b6f51371f3ad599e
MD5 9c8c4d652459acdd5da23ac182fe861e
BLAKE2b-256 b3e03b79d03523cef26ee0c049eb86c180b292ed1e14a23d191370dc6cddb608

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page