Skip to main content

Haystack 2.x DocumentStore for VelesDB: The Local AI Memory Database.

Project description

haystack-velesdb

A Haystack 2.x DocumentStore backed by VelesDB — the local-first, microsecond-latency vector database.

This integration joins the existing LangChain and LlamaIndex connectors, completing the trio of major Python RAG frameworks supported by VelesDB.

Installation

pip install haystack-velesdb

For development:

pip install -e "integrations/haystack[dev]"

Quick start

from haystack_velesdb import VelesDBDocumentStore
from haystack.dataclasses import Document

store = VelesDBDocumentStore(
    path="./my_docs",
    collection_name="knowledge_base",
    embedding_dim=768,
    metric="cosine",
)

# Write pre-embedded documents
documents = [
    Document(id="doc1", content="VelesDB is fast.", embedding=[0.1, 0.2, ...]),
    Document(id="doc2", content="Local-first AI memory.", embedding=[0.3, 0.4, ...]),
]
store.write_documents(documents)

# Retrieve by vector
results = store.embedding_retrieval(query_embedding=[0.1, 0.2, ...], top_k=5)
for doc in results:
    print(doc.content, doc.score)

Full RAG pipeline

See examples/rag_pipeline.py for a complete PDF ingestion and semantic search example using SentenceTransformersDocumentEmbedder.

from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.embedders import (
    SentenceTransformersDocumentEmbedder,
    SentenceTransformersTextEmbedder,
)
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack_velesdb import VelesDBDocumentStore

store = VelesDBDocumentStore(path="./rag_store", embedding_dim=384)

# Indexing pipeline
indexer = Pipeline()
indexer.add_component("converter", PyPDFToDocument())
indexer.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
indexer.add_component("embedder", SentenceTransformersDocumentEmbedder(model="all-MiniLM-L6-v2"))
indexer.add_component("writer", DocumentWriter(document_store=store))
indexer.connect("converter", "splitter")
indexer.connect("splitter", "embedder")
indexer.connect("embedder", "writer")
indexer.run({"converter": {"sources": ["paper.pdf"]}})

# Query pipeline
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever

querier = Pipeline()
querier.add_component("embedder", SentenceTransformersTextEmbedder(model="all-MiniLM-L6-v2"))
querier.add_component("retriever", InMemoryEmbeddingRetriever(document_store=store))
querier.connect("embedder.embedding", "retriever.query_embedding")
result = querier.run({"embedder": {"text": "What is VelesDB?"}})
print(result["retriever"]["documents"])

API reference

VelesDBDocumentStore

Parameter Default Description
path "./velesdb_haystack" Directory where VelesDB persists data
collection_name "haystack_documents" VelesDB collection name
embedding_dim 768 Embedding vector dimension
metric "cosine" Distance metric: "cosine", "euclidean", or "dot"

Methods

Method Description
write_documents(documents, policy) Upsert documents; returns count written
filter_documents(filters) Scroll documents matching a VelesDB filter dict
embedding_retrieval(query_embedding, top_k, filters, scale_score) Vector similarity search
count_documents() Total document count
delete_documents(document_ids) Delete by Haystack string IDs
to_dict() / from_dict() Haystack pipeline serialisation

Note on DuplicatePolicy: NONE and OVERWRITE use VelesDB upsert semantics and always overwrite on collision. FAIL is fully enforced: a pre-scan is performed before writing and DuplicateDocumentError is raised if any document already exists (prefer OVERWRITE or NONE for bulk loads to skip the scan cost).

Note on document IDs and SHA-256: Haystack string IDs are mapped to 63-bit integers using the first 8 bytes of SHA-256 (~9.2 × 10¹⁸ slots). For a 1 M-document collection the collision probability is roughly 5 × 10⁻¹⁴, which is negligible for typical RAG workloads. A ValueError is raised at write time if a collision is detected between a new document and an existing one.

Note on scale_score: When True (default), cosine similarity scores are normalised from [-1, 1] to [0, 1] so they behave like probabilities in downstream re-ranking.

Running tests

cd integrations/haystack
pip install -e ".[dev]"
pytest tests/ -v

Tests use lightweight fake VelesDB objects — no running server required.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haystack_velesdb-1.14.1.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

haystack_velesdb-1.14.1-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file haystack_velesdb-1.14.1.tar.gz.

File metadata

  • Download URL: haystack_velesdb-1.14.1.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for haystack_velesdb-1.14.1.tar.gz
Algorithm Hash digest
SHA256 ee3e6fed5355d16f61b4ef5638d295a7f4c31cba1120f79245aa18bc4dad91c6
MD5 9d726b3bb5017c838fe5a5e48e5f3c14
BLAKE2b-256 46d11efb71f2b1074bbcaa586a96df4ab03ed4534b1458e543d6645cc0fee92c

See more details on using hashes here.

File details

Details for the file haystack_velesdb-1.14.1-py3-none-any.whl.

File metadata

File hashes

Hashes for haystack_velesdb-1.14.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6dc6598779e084a9c0ef2d5f4e927f1ab69e3226827d23a400cb64c2e1ed6b7c
MD5 081b02f76be58dc6fe00e2dbda109813
BLAKE2b-256 dd62846e63093bd8540ea9534491bf555b1dc759210ac884293b83c11a29c99b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page