Skip to main content

LangChain VectorStore for VelesDB: The Local AI Memory Database. Microsecond RAG retrieval.

Project description

langchain-velesdb

LangChain integration for VelesDB vector database.

Installation

pip install langchain-velesdb

Quick Start

from langchain_velesdb import VelesDBVectorStore
from langchain_openai import OpenAIEmbeddings

# Initialize vector store
vectorstore = VelesDBVectorStore(
    path="./my_vectors",
    collection_name="documents",
    embedding=OpenAIEmbeddings()
)

# Add documents
vectorstore.add_texts([
    "VelesDB is a high-performance vector database",
    "Built entirely in Rust for speed and safety",
    "Perfect for RAG applications and semantic search"
])

# Search
results = vectorstore.similarity_search("fast database", k=2)
for doc in results:
    print(doc.page_content)

Usage with RAG

from langchain_velesdb import VelesDBVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.chains import RetrievalQA

# Create vector store with documents
vectorstore = VelesDBVectorStore.from_texts(
    texts=["Document 1 content", "Document 2 content"],
    embedding=OpenAIEmbeddings(),
    path="./rag_data",
    collection_name="knowledge_base"
)

# Create RAG chain
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
qa_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(),
    chain_type="stuff",
    retriever=retriever
)

# Ask questions
answer = qa_chain.run("What is VelesDB?")
print(answer)

API Reference

VelesDBVectorStore

VelesDBVectorStore(
    embedding: Embeddings,
    path: str = "./velesdb_data",
    collection_name: str = "langchain",
    metric: str = "cosine",      # "cosine", "euclidean", "dot" (aliases: "dotproduct", "inner", "ip"), "hamming", "jaccard"
    storage_mode: str = "full",  # "full"/"f32", "sq8"/"int8" (4× compression), "binary"/"bit" (32× compression), "pq" (8-32× compression), "rabitq" (32× with scalar correction)
)

Methods

Core Operations:

  • add_texts(texts, metadatas=None, ids=None) - Add texts to the store
  • add_texts_bulk(texts, metadatas=None, ids=None) - Bulk insert (2-3x faster for large batches)
  • delete(ids) - Delete documents by ID
  • get_by_ids(ids) - Retrieve documents by their IDs
  • flush() - Flush pending changes to disk

Search:

  • similarity_search(query, k=4) - Search for similar documents
  • similarity_search_with_score(query, k=4) - Search with similarity scores
  • similarity_search_with_filter(query, k=4, filter=None) - Search with metadata filtering
  • batch_search(queries, k=4) - Batch search multiple queries in parallel
  • batch_search_with_score(queries, k=4) - Batch search with scores
  • multi_query_search(queries, k=4, fusion="rrf", ...) - Multi-query fusion search
  • multi_query_search_with_score(queries, k=4, ...) - Multi-query search with fused scores
  • hybrid_search(query, k=4, vector_weight=0.5, filter=None) - Hybrid vector+BM25 search
  • text_search(query, k=4, filter=None) - Full-text BM25 search
  • query(query_str, params=None) - Execute VelesQL query

Utilities:

  • as_retriever(**kwargs) - Convert to LangChain retriever
  • from_texts(texts, embedding, ...) - Create store from texts (class method)
  • get_collection_info() - Get collection metadata (name, dimension, point_count)
  • is_empty() - Check if collection is empty

Advanced Features

Multi-Query Fusion (MQG)

Search with multiple query reformulations and fuse results using various strategies. Perfect for RAG pipelines using Multiple Query Generation (MQG).

# Basic usage with RRF (Reciprocal Rank Fusion)
results = vectorstore.multi_query_search(
    queries=["travel to Greece", "Greek vacation", "Athens trip"],
    k=10,
)

# With weighted fusion (like SearchXP's scoring)
results = vectorstore.multi_query_search(
    queries=["travel Greece", "vacation Mediterranean"],
    k=10,
    fusion="weighted",
    fusion_params={
        "avg_weight": 0.6,   # Average score weight
        "max_weight": 0.3,   # Maximum score weight  
        "hit_weight": 0.1,   # Hit ratio weight
    }
)

# Get fused scores
results_with_scores = vectorstore.multi_query_search_with_score(
    queries=["query1", "query2", "query3"],
    k=5,
    fusion="rrf",
    fusion_params={"k": 60}  # RRF parameter
)
for doc, score in results_with_scores:
    print(f"{score:.3f}: {doc.page_content}")

Fusion Strategies:

  • "rrf" - Reciprocal Rank Fusion (default, robust to score scale differences)
  • "average" - Mean score across all queries
  • "maximum" - Maximum score from any query
  • "weighted" - Custom combination of avg, max, and hit ratio

Hybrid Search (Vector + BM25)

# Combine vector similarity with keyword matching
results = vectorstore.hybrid_search(
    query="machine learning performance",
    k=5,
    vector_weight=0.7  # 70% vector, 30% BM25
)
for doc, score in results:
    print(f"{score:.3f}: {doc.page_content}")

Full-Text Search (BM25)

# Pure keyword-based search
results = vectorstore.text_search("VelesDB Rust", k=5)
for doc, score in results:
    print(f"{score:.3f}: {doc.page_content}")

Metadata Filtering

# Search with filters
results = vectorstore.similarity_search_with_filter(
    query="database",
    k=5,
    filter={"condition": {"type": "eq", "field": "category", "value": "tech"}}
)

Features

  • High Performance: VelesDB's Rust backend delivers sub-millisecond latencies
  • SIMD Optimized: Hardware-accelerated vector operations
  • Multi-Query Fusion: Native support for MQG pipelines with RRF/Weighted fusion
  • Hybrid Search: Combine vector similarity with BM25 text matching
  • Full-Text Search: BM25 ranking for keyword queries
  • Metadata Filtering: Filter results by document attributes
  • Simple Setup: Self-contained single binary, no external services required
  • Full LangChain Compatibility: Works with all LangChain chains and agents

License

MIT License (this integration). See LICENSE for details.

VelesDB Core itself is licensed under the VelesDB Core License 1.0 (based on ELv2).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_velesdb-1.11.0.tar.gz (43.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_velesdb-1.11.0-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_velesdb-1.11.0.tar.gz.

File metadata

  • Download URL: langchain_velesdb-1.11.0.tar.gz
  • Upload date:
  • Size: 43.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for langchain_velesdb-1.11.0.tar.gz
Algorithm Hash digest
SHA256 122577d901551876a2ae4ff30608b47fb5c645bde90d254378dd1fb070f982d7
MD5 3093d8f9935fe86675d9fec1d46d6433
BLAKE2b-256 ceeed846adfdd8f9e2b8bc0267aae204ae8cef4b796587d61c815539c1522807

See more details on using hashes here.

File details

Details for the file langchain_velesdb-1.11.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_velesdb-1.11.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b0744cd185f37b1e3a47c173e6f95d33c139401844f52beface0c4ba1aa0a7ba
MD5 c467d125a41aa2e6f95de3444d5bda34
BLAKE2b-256 a5147d2f1c9367e06d0abe569b6060f1e294a8e3fb89cc7f541137ae330d17fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page