Haystack document store and retriever for DuckDB VSS
Project description
DuckDB Document Store for Haystack
[!NOTE] This project is a proof of concept - use at your own risk. The code may be susceptible to bugs and security issues (such as SQL injection), proceed with caution.
A DuckDB-backed document store for Haystack with HNSW vector search via DuckDB's VSS extension. It supports:
- Dense embedding storage with HNSW indexing (cosine similarity, Euclidean distance, or inner product distance)
- Filtering with Haystack-style filter dictionaries
- In-memory operation or persistence via a DuckDB database file on disk
Installation (GitHub)
Use uv to install directly from the repository:
uv pip install "duckdb-haystack @ git+https://github.com/AdrianoKF/duckdb-haystack.git"
Usage
1) DocumentStore CRUD example
from haystack import Document
from haystack_integrations.document_stores.duckdb import DuckDBDocumentStore, document_store
store = DuckDBDocumentStore(
database=":memory:",
embedding_dim=3,
similarity_metric="cosine",
)
store.write_documents(
[
Document(id="doc-1", content="DuckDB is fast.", embedding=[0.1, 0.0, 0.9], meta={"source": "notes"}),
Document(id="doc-2", content="Haystack pipelines are modular.", embedding=[0.2, 0.1, 0.8]),
]
)
print("Total document count:", store.count_documents())
filters = {"field": "meta.source", "operator": "==", "value": "notes"}
filtered = store.filter_documents(filters=filters)
print("Filtered documents:", [doc.id for doc in filtered])
store.delete_documents(document_ids=["doc-2"])
print("After deletion:", store.filter_documents())
2) Retrieval with DuckDBRetriever in a pipeline
from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack_integrations.document_stores.duckdb import DuckDBDocumentStore
from haystack_integrations.retrievers.duckdb import DuckDBRetriever
store = DuckDBDocumentStore(database=":memory:", embedding_dim=384)
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
documents = [
Document(content="DuckDB stores vectors in Float arrays backed by an HNSW index."),
Document(content="DuckDB is an analytical in-process SQL database management system."),
Document(content="Haystack offers composable pipelines."),
]
documents = doc_embedder.run(documents=documents)["documents"]
store.write_documents(documents)
query_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
retriever = DuckDBRetriever(document_store=store)
pipeline = Pipeline()
pipeline.add_component("query_embedder", query_embedder)
pipeline.add_component("retriever", retriever)
pipeline.connect("query_embedder.embedding", "retriever.query_embedding")
result = pipeline.run(data={"query_embedder": {"text": "How does DuckDB store vectors?"}})
print(result["retriever"]["documents"][0].content)
License
duckdb-haystack is distributed under the terms of the Apache-2.0
license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file duckdb_haystack-0.0.1.post1.tar.gz.
File metadata
- Download URL: duckdb_haystack-0.0.1.post1.tar.gz
- Upload date:
- Size: 125.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffa377f41c82bccf1fba43240f8f9a08eb9d276a41f2a1b202644c6c07a416e5
|
|
| MD5 |
17d04806aa1fc4d7a2dc3b400ada5600
|
|
| BLAKE2b-256 |
8f4a457acbef52297586758de3510098596e7964d0a46f9f01f86e3ea6d2040b
|
File details
Details for the file duckdb_haystack-0.0.1.post1-py3-none-any.whl.
File metadata
- Download URL: duckdb_haystack-0.0.1.post1-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d4697cf3c4e1ddcc95f93d56417e96d585b79d20bb7c8c153c1af5bd070447f
|
|
| MD5 |
d1ce8913850ae46351c13faa579030db
|
|
| BLAKE2b-256 |
9c1bf2e9adeaac101df3fd787090271420545baf0b9a6bc11277e78f95466a59
|