TopK document store integration for Haystack with vector search, semantic search, keyword search, hybrid search and metadata retrievers

These details have not been verified by PyPI

Project links

Project description

topk-haystack

Build RAG pipelines in a few lines of code with TopK and Haystack.

Ships with retrievers for every search mode — semantic (embeddings handled server-side, no embedder component needed), dense vector, BM25 keyword, hybrid, and metadata filtering. Scales to billions of documents with native partition support for multi-tenant workloads.

Installation

pip install topk-haystack

Quick start

import os
from haystack import Document, Pipeline
from haystack.components.writers import DocumentWriter
from haystack.utils import Secret

from haystack_integrations.components.topk import TopKSemanticRetriever
from haystack_integrations.document_stores.topk import TopKDocumentStore

store = TopKDocumentStore(
    api_key=Secret.from_env_var("TOPK_API_KEY"), # Get your API key: https://console.topk.io/api-key
    region="aws-us-east-1-elastica", # See available regions: https://docs.topk.io/regions
    collection_name="my-docs",
)

# Index documents
indexing = Pipeline()
indexing.add_component("writer", DocumentWriter(document_store=store))
indexing.run({"writer": {"documents": [
    Document(content="Rust guarantees memory safety without a garbage collector."),
    Document(content="Python is known for readable syntax and scientific libraries."),
]}})

# Query
retriever = TopKSemanticRetriever(document_store=store, top_k=2)
pipeline = Pipeline()
pipeline.add_component("retriever", retriever)
result = pipeline.run({"retriever": {"query": "memory safe systems programming"}})
for doc in result["retriever"]["documents"]:
    print(f"[{doc.score:.3f}] {doc.content}")

Set TOPK_API_KEY in your environment. Get your API key from the TopK console.

Document store

TopKDocumentStore(
    region="aws-us-east-1-elastica",   # required — see https://topk.io/docs/regions
    api_key=Secret.from_env_var("TOPK_API_KEY"),
    collection_name="haystack",        # collection to create or reuse
    embedding_dim=768,                 # vector dimension (must match your embedder)
    similarity="cosine",               # "cosine" | "euclidean" | "dot_product"
    recreate_collection=False,         # drop and recreate on init
    filter_documents_limit=10_000,     # cap for filter_documents()
    partition=None,                    # optional partition for multi-tenant use
)

TopK uses upsert semantics — documents with the same ID are overwritten when using DuplicatePolicy.NONE or DuplicatePolicy.OVERWRITE. DuplicatePolicy.SKIP and DuplicatePolicy.FAIL are not supported and raise a ValueError.

TopK can only return metadata fields that are explicitly selected. This integration automatically returns meta.* fields referenced in filters; unfiltered queries return documents without metadata.

Retrievers

Semantic (server-side embedding)

TopK embeds documents and queries server-side. No embedder component needed.

from haystack_integrations.components.topk import TopKSemanticRetriever

retriever = TopKSemanticRetriever(document_store=store, top_k=5)
pipeline.add_component("retriever", retriever)
result = pipeline.run({"retriever": {"query": "your question here"}})

Dense vector (bring your own embedder)

Embed documents and queries with your own model (e.g. SentenceTransformers). embedding_dim in TopKDocumentStore must match the model's output dimension.

from haystack.components.embedders import (
    SentenceTransformersDocumentEmbedder,
    SentenceTransformersTextEmbedder,
)
from haystack_integrations.components.topk import TopKEmbeddingRetriever

MODEL = "sentence-transformers/all-MiniLM-L6-v2"

# Indexing — embed before writing
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder(model=MODEL))
indexing.add_component("writer", DocumentWriter(document_store=store))
indexing.connect("embedder.documents", "writer.documents")

# Querying
query_pipeline = Pipeline()
query_pipeline.add_component("embedder", SentenceTransformersTextEmbedder(model=MODEL))
query_pipeline.add_component("retriever", TopKEmbeddingRetriever(document_store=store, top_k=5))
query_pipeline.connect("embedder.embedding", "retriever.query_embedding")
result = query_pipeline.run({"embedder": {"text": "your question here"}})

BM25 keyword

from haystack_integrations.components.topk import TopKBM25Retriever

retriever = TopKBM25Retriever(document_store=store, top_k=5)
pipeline.add_component("retriever", retriever)
result = pipeline.run({"retriever": {"query": "keyword search terms"}})

Hybrid (vector + BM25)

Combines dense vector similarity with BM25 keyword scoring in a single query. Takes both a text embedding and a keyword query string.

from haystack_integrations.components.topk import TopKHybridRetriever

retriever = TopKHybridRetriever(document_store=store, top_k=5)
query_pipeline = Pipeline()
query_pipeline.add_component("embedder", SentenceTransformersTextEmbedder(model=MODEL))
query_pipeline.add_component("retriever", retriever)
query_pipeline.connect("embedder.embedding", "retriever.query_embedding")
result = query_pipeline.run({
    "embedder": {"text": "your natural language question"},
    "retriever": {"query": "keyword terms"},
})

Metadata filter

Retrieve documents by metadata filters only.

from haystack_integrations.components.topk import TopKMetadataRetriever

retriever = TopKMetadataRetriever(document_store=store, top_k=5)
pipeline.add_component("retriever", retriever)
result = pipeline.run({"retriever": {"filters": {
    "operator": "AND",
    "conditions": [
        {"field": "meta.language", "operator": "==", "value": "en"},
        {"field": "meta.year", "operator": ">=", "value": 2020},
    ],
}}})

Metadata filters

All retrievers accept Haystack-style filter dicts. Supported operators:

Operator	Description
`==`, `!=`	Equality / inequality
`>`, `>=`, `<`, `<=`	Numeric comparison
`in`	Field value is in a list
`not in`	Field value is not in a list
`AND`, `OR`, `NOT`	Logical combinators

filters = {
    "operator": "AND",
    "conditions": [
        {"field": "meta.language", "operator": "==", "value": "en"},
        {
            "operator": "OR",
            "conditions": [
                {"field": "meta.year", "operator": "==", "value": 2024},
                {"field": "meta.year", "operator": "==", "value": 2025},
            ],
        },
    ],
}

Multi-tenant (partitions)

Use the partition parameter to scope all reads and writes to a logical partition. Different partitions in the same collection are fully isolated.

store_a = TopKDocumentStore(region="...", collection_name="shared", partition="tenant-a")
store_b = TopKDocumentStore(region="...", collection_name="shared", partition="tenant-b")

Development

git clone https://github.com/topk-io/topk-haystack
cd topk-haystack
uv sync --group dev

export TOPK_API_KEY=your-api-key   # https://console.topk.io/api-key
export TOPK_REGION=aws-us-east-1-elastica

uv run pytest -m "not integration" tests/   # unit tests
uv run pytest -m "integration" tests/       # integration tests
uv run pytest tests/                        # all tests
uv run pytest --cov=haystack_integrations tests/  # with coverage

Lint and format:

uv run ruff check --fix . && uv run ruff format .   # auto-fix
uv run ruff check . && uv run ruff format --check . # check only

Or with Hatch:

hatch run fmt            # auto-fix
hatch run fmt-check      # check only
hatch run test:unit
hatch run test:integration
hatch run test:all
hatch run test:cov

License

Apache-2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topk_haystack-0.1.0.tar.gz (185.2 kB view details)

Uploaded Jun 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

topk_haystack-0.1.0-py3-none-any.whl (21.0 kB view details)

Uploaded Jun 8, 2026 Python 3

File details

Details for the file topk_haystack-0.1.0.tar.gz.

File metadata

Download URL: topk_haystack-0.1.0.tar.gz
Upload date: Jun 8, 2026
Size: 185.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.17.0 {"ci":true,"cpu":"x86_64","distro":{"id":"noble","libc":{"lib":"glibc","version":"2.39"},"name":"Ubuntu","version":"24.04"},"implementation":{"name":"CPython","version":"3.13.13"},"installer":{"name":"hatch","version":"1.17.0"},"openssl_version":"OpenSSL 3.0.13 30 Jan 2024","python":"3.13.13","system":{"name":"Linux","release":"6.17.0-1015-azure"}} HTTPX2/2.3.0

File hashes

Hashes for topk_haystack-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`af06cc5ee1b73889b356b4a8de9258e181d195b825696c738716f50192336850`
MD5	`2b07eab6656762e8759bfafb32095319`
BLAKE2b-256	`cdef5ede9ea550d11c1a2a6911d510fe4ebd089a94c618281d0016a2d4949339`

See more details on using hashes here.

File details

Details for the file topk_haystack-0.1.0-py3-none-any.whl.

File metadata

Download URL: topk_haystack-0.1.0-py3-none-any.whl
Upload date: Jun 8, 2026
Size: 21.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: Hatch/1.17.0 {"ci":true,"cpu":"x86_64","distro":{"id":"noble","libc":{"lib":"glibc","version":"2.39"},"name":"Ubuntu","version":"24.04"},"implementation":{"name":"CPython","version":"3.13.13"},"installer":{"name":"hatch","version":"1.17.0"},"openssl_version":"OpenSSL 3.0.13 30 Jan 2024","python":"3.13.13","system":{"name":"Linux","release":"6.17.0-1015-azure"}} HTTPX2/2.3.0

File hashes

Hashes for topk_haystack-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`419c6c33fa43179c16057800585fc6e79d6eb4653af7d28b36cb6b8e8a994fa1`
MD5	`72ed6d161e1d2bd1ab99680d233ea418`
BLAKE2b-256	`15042a4472b1fe791be1199ae00a7bfa2e8b8b474f10a00f5a5da890ed3dae1c`

See more details on using hashes here.

topk-haystack 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

topk-haystack

Installation

Quick start

Document store

Retrievers

Semantic (server-side embedding)

Dense vector (bring your own embedder)

BM25 keyword

Hybrid (vector + BM25)

Metadata filter

Metadata filters

Multi-tenant (partitions)

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes