Skip to main content

Haystack Document Store and Retriever backed by Pixeltable multimodal data infrastructure.

Project description

pixeltable-haystack

PyPI CI License

Haystack Document Store and Retriever backed by Pixeltable — persistent, versioned, multimodal data infrastructure for AI applications.

Installation

pip install pixeltable-haystack

Quick Start

Document Store

from haystack import Document
from haystack_pixeltable import PixeltableDocumentStore

store = PixeltableDocumentStore(
    table_name="myproject.docs",
    embedding_dimension=1536,
)

# Write documents
store.write_documents([
    Document(content="Pixeltable is multimodal data infrastructure.", embedding=[...]),
    Document(content="Haystack is a framework for building RAG pipelines.", embedding=[...]),
])

# Filter documents
results = store.filter_documents(
    filters={"field": "meta.category", "operator": "==", "value": "docs"}
)

# Count
print(store.count_documents())

Retriever (Similarity Search)

from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever

store = PixeltableDocumentStore(
    table_name="myproject.docs",
    embedding_dimension=1536,
)
retriever = PixeltableRetriever(document_store=store, top_k=5)

# Search by embedding vector
result = retriever.run(query_embedding=[0.1, 0.2, ...])
for doc in result["documents"]:
    print(f"{doc.content} (score: {doc.score:.3f})")

In a Haystack Pipeline

from haystack import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder, SentenceTransformersDocumentEmbedder
from haystack.components.writers import DocumentWriter
from haystack_pixeltable import PixeltableDocumentStore, PixeltableRetriever

store = PixeltableDocumentStore(
    table_name="rag.knowledge",
    embedding_dimension=384,
)

# Indexing pipeline
indexing = Pipeline()
indexing.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing.add_component("writer", DocumentWriter(document_store=store))
indexing.connect("embedder", "writer")

# Query pipeline
query = Pipeline()
query.add_component("embedder", SentenceTransformersTextEmbedder())
query.add_component("retriever", PixeltableRetriever(document_store=store, top_k=5))
query.connect("embedder.embedding", "retriever.query_embedding")

Filtering

The Document Store supports the Haystack filter specification:

# Simple equality
store.filter_documents(filters={"field": "meta.category", "operator": "==", "value": "science"})

# Comparison operators: ==, !=, >, >=, <, <=
store.filter_documents(filters={"field": "meta.score", "operator": ">", "value": 0.8})

# Compound AND
store.filter_documents(filters={
    "operator": "AND",
    "conditions": [
        {"field": "meta.category", "operator": "==", "value": "science"},
        {"field": "meta.score", "operator": ">", "value": 0.5},
    ],
})

# Compound OR
store.filter_documents(filters={
    "operator": "OR",
    "conditions": [
        {"field": "meta.source", "operator": "==", "value": "arxiv"},
        {"field": "meta.source", "operator": "==", "value": "pubmed"},
    ],
})

Pixeltable Escape Hatch: .table

The .table property gives direct access to the underlying Pixeltable table for operations beyond the Haystack interface:

store = PixeltableDocumentStore(table_name="myproject.docs", embedding_dimension=1536)
t = store.table

# Add a computed column
import pixeltable.functions.openai as openai
t.add_computed_column(
    summary=openai.chat_completions(
        messages=[{"role": "user", "content": t.content}],
        model="gpt-4o-mini",
    )
)

# Use arbitrary Pixeltable queries
results = t.where(t.meta["category"] == "science").select(t.content, t.summary).collect()

# Version history
print(t.count(version=-1))  # row count at previous version

Why Pixeltable?

Feature Pixeltable Chroma Qdrant pgvector
Persistent storage Built-in Opt-in Opt-in Built-in
Computed columns Native No No No
Version history Native No No No
Multimodal types Image, Video, Audio, Document Text only Text only Text only
Metadata filtering JSON + SQL predicates Limited Rich SQL
Embedding auto-compute Via computed columns Manual Manual Manual

Development

pip install -e ".[dev]"
pytest tests/ -v
ruff check . && ruff format --check .

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

haystack_pixeltable-0.1.0.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

haystack_pixeltable-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file haystack_pixeltable-0.1.0.tar.gz.

File metadata

  • Download URL: haystack_pixeltable-0.1.0.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for haystack_pixeltable-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b2723305c698ac9e405add225299e8a65725d7192ef81837a97b747de25de218
MD5 de4cf6805cf6f5549ce7b1d8f0388ee8
BLAKE2b-256 3789394e925bc37b10b2549f28cb0df21f23e4e382c72e6c505db9aa862ebfb2

See more details on using hashes here.

Provenance

The following attestation bundles were made for haystack_pixeltable-0.1.0.tar.gz:

Publisher: release.yml on pixeltable/haystack-pixeltable

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file haystack_pixeltable-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for haystack_pixeltable-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e20c2cba75d7e27b5d74ef30e24fc4c2b340c4124ff0c4ae7ebaf2cddb13dc0
MD5 59c5eb57bfa0c199d94750177a34fed6
BLAKE2b-256 53c45d062627b52e17cff17a5241243b71b0d97c0c35cc3bbf5293571171bd90

See more details on using hashes here.

Provenance

The following attestation bundles were made for haystack_pixeltable-0.1.0-py3-none-any.whl:

Publisher: release.yml on pixeltable/haystack-pixeltable

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page