Haystack 2.x DocumentStore for VelesDB: The Local AI Memory Database.
Project description
haystack-velesdb
A Haystack 2.x DocumentStore backed by VelesDB —
the local-first, microsecond-latency vector database.
This integration joins the existing LangChain and LlamaIndex connectors, completing the trio of major Python RAG frameworks supported by VelesDB.
Installation
pip install haystack-velesdb
For development:
pip install -e "integrations/haystack[dev]"
Quick start
from haystack_velesdb import VelesDBDocumentStore
from haystack.dataclasses import Document
store = VelesDBDocumentStore(
path="./my_docs",
collection_name="knowledge_base",
embedding_dim=768,
metric="cosine",
)
# Write pre-embedded documents
documents = [
Document(id="doc1", content="VelesDB is fast.", embedding=[0.1, 0.2, ...]),
Document(id="doc2", content="Local-first AI memory.", embedding=[0.3, 0.4, ...]),
]
store.write_documents(documents)
# Retrieve by vector
results = store.embedding_retrieval(query_embedding=[0.1, 0.2, ...], top_k=5)
for doc in results:
print(doc.content, doc.score)
Full RAG pipeline
See examples/rag_pipeline.py for a complete PDF ingestion
and semantic search example using SentenceTransformersDocumentEmbedder.
from haystack import Pipeline
from haystack.components.converters import PyPDFToDocument
from haystack.components.embedders import (
SentenceTransformersDocumentEmbedder,
SentenceTransformersTextEmbedder,
)
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter
from haystack_velesdb import VelesDBDocumentStore
store = VelesDBDocumentStore(path="./rag_store", embedding_dim=384)
# Indexing pipeline
indexer = Pipeline()
indexer.add_component("converter", PyPDFToDocument())
indexer.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=3))
indexer.add_component("embedder", SentenceTransformersDocumentEmbedder(model="all-MiniLM-L6-v2"))
indexer.add_component("writer", DocumentWriter(document_store=store))
indexer.connect("converter", "splitter")
indexer.connect("splitter", "embedder")
indexer.connect("embedder", "writer")
indexer.run({"converter": {"sources": ["paper.pdf"]}})
# Query pipeline. `InMemoryEmbeddingRetriever` is bound to `InMemoryDocumentStore`
# and would NOT work against a custom DocumentStore — wrap `embedding_retrieval`
# in a thin Haystack component that forwards the call. Full working example in
# `integrations/haystack/examples/rag_pipeline.py` (`_VelesRetriever`).
from haystack import component
from haystack.dataclasses import Document
from typing import List
@component
class VelesRetriever:
def __init__(self, document_store, top_k: int = 10):
self._store = document_store
self._top_k = top_k
@component.output_types(documents=List[Document])
def run(self, query_embedding: List[float]):
return {"documents": self._store.embedding_retrieval(query_embedding, top_k=self._top_k)}
querier = Pipeline()
querier.add_component("embedder", SentenceTransformersTextEmbedder(model="all-MiniLM-L6-v2"))
querier.add_component("retriever", VelesRetriever(document_store=store))
querier.connect("embedder.embedding", "retriever.query_embedding")
result = querier.run({"embedder": {"text": "What is VelesDB?"}})
print(result["retriever"]["documents"])
API reference
VelesDBDocumentStore
| Parameter | Default | Description |
|---|---|---|
path |
"./velesdb_haystack" |
Directory where VelesDB persists data |
collection_name |
"haystack_documents" |
VelesDB collection name |
embedding_dim |
768 |
Embedding vector dimension |
metric |
"cosine" |
Distance metric: "cosine", "euclidean", or "dot" |
Methods
| Method | Description |
|---|---|
write_documents(documents, policy) |
Upsert documents; returns count written |
filter_documents(filters) |
Scroll documents matching a VelesDB filter dict |
embedding_retrieval(query_embedding, top_k, filters, scale_score) |
Vector similarity search |
count_documents() |
Total document count |
delete_documents(document_ids) |
Delete by Haystack string IDs |
to_dict() / from_dict() |
Haystack pipeline serialisation |
Note on DuplicatePolicy: NONE and OVERWRITE use VelesDB upsert semantics
and always overwrite on collision. FAIL is fully enforced: a pre-scan is
performed before writing and DuplicateDocumentError is raised if any document
already exists (prefer OVERWRITE or NONE for bulk loads to skip the scan cost).
Note on document IDs and SHA-256: Haystack string IDs are mapped to 63-bit
integers using the first 8 bytes of SHA-256 (~9.2 × 10¹⁸ slots). For a
1 M-document collection the collision probability is roughly 5 × 10⁻¹⁴, which
is negligible for typical RAG workloads. A ValueError is raised at write time
if a collision is detected between a new document and an existing one.
Note on scale_score: When True (default), cosine similarity scores
are normalised from [-1, 1] to [0, 1] so they behave like probabilities
in downstream re-ranking.
Running tests
cd integrations/haystack
pip install -e ".[dev]"
pytest tests/ -v
Tests use lightweight fake VelesDB objects — no running server required.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file haystack_velesdb-1.14.2.tar.gz.
File metadata
- Download URL: haystack_velesdb-1.14.2.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f8743142055045fadc46fdd02b2f0840ebf6b6c88dc140ae785ea3fbd48de064
|
|
| MD5 |
a6f95259fb79b29e922b6b0cb53f589a
|
|
| BLAKE2b-256 |
a97eb65db367eccec8b465acb4b2916cebc0529ac7598ac4f1d1ea63375fb023
|
File details
Details for the file haystack_velesdb-1.14.2-py3-none-any.whl.
File metadata
- Download URL: haystack_velesdb-1.14.2-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60360773d0ee5cfecf270a8434d40d34014b84dd8c552f05b9597318cf363ebd
|
|
| MD5 |
4bbf2ced68e4d101955649ab60e99da3
|
|
| BLAKE2b-256 |
25af67cb31b878eae05d59e4e4fe4c92395ee7f478910b971668ee1343bfd3a3
|