Skip to main content

No project description provided

Project description

test

DuckDB Document Store for Haystack

[!NOTE] This project is a proof of concept - use at your own risk. The code may be susceptible to bugs and security issues (such as SQL injection), proceed with caution.

A DuckDB-backed document store for Haystack with HNSW vector search via DuckDB's VSS extension. It supports:

  • Dense embedding storage with HNSW indexing (cosine similarity, Euclidean distance, or inner product distance)
  • Filtering with Haystack-style filter dictionaries
  • In-memory operation or persistence via a DuckDB database file on disk

Installation (GitHub)

Use uv to install directly from the repository:

uv pip install "duckdb-haystack @ git+https://github.com/AdrianoKF/duckdb-haystack.git"

Usage

1) DocumentStore CRUD example

from haystack import Document

from haystack_integrations.document_stores.duckdb import DuckDBDocumentStore, document_store

store = DuckDBDocumentStore(
    database=":memory:",
    embedding_dim=3,
    similarity_metric="cosine",
)

store.write_documents(
    [
        Document(id="doc-1", content="DuckDB is fast.", embedding=[0.1, 0.0, 0.9], meta={"source": "notes"}),
        Document(id="doc-2", content="Haystack pipelines are modular.", embedding=[0.2, 0.1, 0.8]),
    ]
)

print("Total document count:", store.count_documents())

filters = {"field": "meta.source", "operator": "==", "value": "notes"}
filtered = store.filter_documents(filters=filters)
print("Filtered documents:", [doc.id for doc in filtered])

store.delete_documents(document_ids=["doc-2"])
print("After deletion:", store.filter_documents())

2) Retrieval with DuckDBRetriever in a pipeline

from haystack import Document, Pipeline
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder

from haystack_integrations.document_stores.duckdb import DuckDBDocumentStore
from haystack_integrations.retrievers.duckdb import DuckDBRetriever

store = DuckDBDocumentStore(database=":memory:", embedding_dim=384)

doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()
documents = [
    Document(content="DuckDB stores vectors in Float arrays backed by an HNSW index."),
    Document(content="DuckDB is an analytical in-process SQL database management system."),
    Document(content="Haystack offers composable pipelines."),
]
documents = doc_embedder.run(documents=documents)["documents"]
store.write_documents(documents)

query_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
retriever = DuckDBRetriever(document_store=store)

pipeline = Pipeline()
pipeline.add_component("query_embedder", query_embedder)
pipeline.add_component("retriever", retriever)
pipeline.connect("query_embedder.embedding", "retriever.query_embedding")

result = pipeline.run(data={"query_embedder": {"text": "How does DuckDB store vectors?"}})

print(result["retriever"]["documents"][0].content)

License

duckdb-haystack is distributed under the terms of the Apache-2.0 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

duckdb_haystack-0.0.1.tar.gz (125.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

duckdb_haystack-0.0.1-py3-none-any.whl (19.0 kB view details)

Uploaded Python 3

File details

Details for the file duckdb_haystack-0.0.1.tar.gz.

File metadata

  • Download URL: duckdb_haystack-0.0.1.tar.gz
  • Upload date:
  • Size: 125.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for duckdb_haystack-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c2a8170ef1498b635ced52f6d21fb206a81f835a0fd0b4c966245be0f8b40dd1
MD5 a10174de95eead5b924f9c34870e97db
BLAKE2b-256 5b047d40b0e0c1a5f1a939d7ec064efd8c2587311b54d9adcc2d4301797bad10

See more details on using hashes here.

File details

Details for the file duckdb_haystack-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: duckdb_haystack-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for duckdb_haystack-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a3430e6d7b24e3bac0d149dbfb266d8db9c16cb54bd34ed3cb0e1e9082373b2a
MD5 91a83bc7893c6e28f49e00e5902d0274
BLAKE2b-256 99eb09bbae8fbf6f87c47ac7aca8024f40c1adedc3201fec40877c18913fefc6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page