Skip to main content

LangChain VectorStore integration for Envector

Project description

LangChain Envector Integration

Encrypted vector search for LangChain using Envector, powered by homomorphic encryption (CKKS). This repo ships a LangChain-compatible VectorStore and retriever utilities built on the high-level pyenvector Python SDK.

Features

  • LangChain VectorStore interface with similarity_search, from_texts, etc.
  • Optional VectorStoreRetriever helper for quick RAG integrations.
  • Client-side encryption handled transparently by the SDK, including score thresholds and filtering.

Installation

  • Python 3.9–3.13 (recommend 3.11)
  • Create and activate a virtualenv:
    • python3.11 -m venv .venv && source .venv/bin/activate
  • Install runtime dependencies:
    • pip install -U pip setuptools wheel
    • pip install pyenvector langchain sentence-transformers

Usage Overview

  1. Configure Envector using EnvectorConfig, pointing to your EnVector endpoint and keys.
  2. Initialize embeddings (or provide pre-computed vectors).
  3. Instantiate Envector(config=cfg, embeddings=emb) and call add_texts, add_documents, or use as_retriever.
  4. Run similarity_search or plug the retriever into your LangChain pipeline.

See notebooks/ for end-to-end walkthroughs and the libs/envector package for implementation details.

Configuration

Key dataclasses live in libs/envector/config.py:

  • ConnectionConfig: address or host/port for EnVector.
  • KeyConfig: key path, key ID, optional preset/eval mode.
  • IndexSettings: index name, dimension (32–4096), query encryption mode, optional output fields and fetch parameters.
  • EnvectorConfig: wraps the above and enables auto-creation via create_if_missing.

Data Model

  • Each vector stores a single metadata string in EnVector.
  • To align with LangChain’s Document, inserts wrap data as JSON: {"text": ..., "metadata": ...}.
  • Retrieval unwraps JSON, returning Document(page_content=text, metadata={...}).
  • Client-side filtering requires the JSON envelope to include an object under metadata.

Limitations

  • Item-level delete/update is unsupported (drop the index to reset).
  • Manual item IDs are not accepted; returned IDs from add_texts are ephemeral.
  • Filtering happens client-side; ensure metadata is JSON for structured filters.

Examples

  • Configuration

    from langchain_envector.config import ConnectionConfig, EnvectorConfig, IndexSettings, KeyConfig
    
    cfg = EnvectorConfig(
        connection=ConnectionConfig(
          address=ENVECTOR_ADDRESS, 
          access_token=ENVECTOR_ACCESS_TOKEN
        ),
        key=KeyConfig(
          key_path=ENVECTOR_KEY_PATH, 
          key_id=ENVECTOR_KEY_ID, 
          preset="ip", 
          eval_mode="rmp"
        ),
        index=IndexSettings(
          index_name=INDEX_NAME, 
          dim=vector_dim, 
          query_encryption="cipher"
        ),
        create_if_missing=True,
    )
    
  • Add documents (from LangChain Documents):

    from langchain_core.documents import Document
    from langchain_envector.vectorstore import Envector
    
    docs = [
      Document(
        page_content="chunk-1", 
        metadata={"source": "paper.pdf", "page": 1, "chunk": 0}
      ),
      Document(
        page_content="chunk-2", 
        metadata={"source": "paper.pdf", "page": 1, "chunk": 1}
      ),
    ]
    
    store = Envector(config=cfg, embeddings=emb)
    store.add_documents(docs)
    

    The method add_texts is also available to store texts.

  • Similarity search

    results = store.similarity_search_with_score(query, k=3)
    for doc, score in results:
        print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")
    

    The methods similarity_search and similarity_search_with_vector (with embeddings.embed_query()) are also available to perform vector search.

Troubleshooting

  • Connection issues: verify EnVector address and registered keys.
  • Embeddings mismatch: ensure embedding dimension equals index.dim when supplying vectors.
  • Unexpected raw strings: confirm inserts used the JSON envelope.
  • Key Issues: check key's metadata to sync with the registered key if facing any key issue.

Testing Without EnVector

  • Run unit tests offline (no EnVector or SDK required):
    • python -m pytest -q -m "not integration"
    • or python scripts/run_unit_tests.py
  • Run integration tests (requires server and keys):
    • Export ENVECTOR_ADDRESS, ENVECTOR_KEY_PATH, ENVECTOR_KEY_ID
    • Optional: ENVECTOR_USE_EMBEDDINGS=1, ENVECTOR_EMB_MODEL, ENVECTOR_USE_HF_DATASET=1
    • python -m pytest -q -m integration -s

Contributing

See CONTRIBUTE.md for development, testing, and PR guidelines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_envector-0.1.3-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_envector-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_envector-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 568fbc0feb63e9982ef276bfc93fa0fe2e06ccbbe0716067a4f16dee5743f0c3
MD5 e6b7cd09f37211e3787e4c089745e4a3
BLAKE2b-256 fa950024a10efb8d6d87631024e5cdad1643a7c8c78602630b8bd3abadde1234

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page