Skip to main content

LangChain VectorStore integration for Pixeltable multimodal data infrastructure.

Project description

langchain-pixeltable

LangChain VectorStore integration backed by Pixeltable -- multimodal data infrastructure with built-in embedding indexes, metadata filtering, computed column lineage, and incremental computation.

Installation

pip install langchain-pixeltable

Quick Start

Works with any LangChain Embeddings model -- cloud or local:

from langchain_pixeltable import PixeltableVectorStore
from langchain_huggingface import HuggingFaceEmbeddings  # no API key needed

vs = PixeltableVectorStore.from_texts(
    texts=[
        "Pixeltable handles multimodal data",
        "LangChain builds LLM applications",
        "Vector databases store embeddings",
    ],
    embedding=HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2"),
    metadatas=[
        {"category": "infra"},
        {"category": "framework"},
        {"category": "infra"},
    ],
    table_name="mydir.docs",
)

# Similarity search
results = vs.similarity_search("multimodal data management", k=2)
for doc in results:
    print(doc.page_content)

Filtered Similarity Search

The filter parameter maps to Pixeltable's .where() clause -- predicates are evaluated before ranking, so only matching rows participate in the similarity sort:

# Only search within "infra" documents
results = vs.similarity_search(
    "data storage", k=5, filter={"category": "infra"},
)

# With scores
results = vs.similarity_search_with_score(
    "embeddings", k=3, filter={"category": "infra"},
)
for doc, score in results:
    print(f"[{score:.3f}] {doc.page_content}")

Access the Underlying Pixeltable Table

The .table property gives direct access to the Pixeltable table for operations beyond the VectorStore interface -- computed columns, lineage, version history, and arbitrary predicates:

import pixeltable as pxt

t = vs.table

# Inspect all data
t.select(t.text, t.metadata, t.embedding).collect()

# Add a computed column -- auto-backfills all existing rows
t.add_computed_column(word_count=my_word_counter(t.text))

# New inserts via the wrapper auto-compute lineage columns
vs.add_texts(["New document"], metadatas=[{"category": "infra"}])

# WHERE on computed columns + similarity
import numpy as np
sim = t.embedding.similarity(vector=np.array(query_vec, dtype=np.float32))
results = (
    t.where(t.word_count > 5)
    .order_by(sim, asc=False)
    .limit(3)
    .select(t.text, t.word_count, sim=sim)
    .collect()
)

Connect to an Existing Pixeltable Table

Connect to any existing Pixeltable table -- including tables with multimodal columns like images or video:

vs = PixeltableVectorStore.from_existing_table(
    table_name="mydir.existing_docs",
    embedding=OpenAIEmbeddings(),
    text_column="content",
    embedding_column="content_embedding",
)
results = vs.similarity_search("search query", filter={"source": "arxiv"})

Use as a LangChain Retriever

retriever = vs.as_retriever(search_kwargs={"k": 5})
docs = retriever.invoke("What is Pixeltable?")

Why Pixeltable as a Vector Backend?

  • Metadata filtering via .where(): Filter on metadata fields before ranking, not post-hoc
  • Computed column lineage: Add derived columns that auto-backfill and auto-compute on new inserts
  • Persistent and versioned: Data survives restarts; every change is tracked
  • Incremental: Only new/changed rows get re-embedded
  • Multimodal native: Images, video, audio, and documents alongside text
  • Any embedding model: Works with OpenAI, Hugging Face, or any local model
  • No external services: Embedded PostgreSQL, no Docker required

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_pixeltable-0.1.2.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_pixeltable-0.1.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_pixeltable-0.1.2.tar.gz.

File metadata

  • Download URL: langchain_pixeltable-0.1.2.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for langchain_pixeltable-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f9ef464c04c1d229eace3bf31944c75c19ffd7a46f477b9a61e5beb4ecacc21d
MD5 fef1c5ecf76005dc821a8a41e35a05e1
BLAKE2b-256 9f35050847978c977cc2d9297f163c6aad5ee38c719a7a15515d993c648e1d19

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_pixeltable-0.1.2.tar.gz:

Publisher: release.yml on pixeltable/langchain-pixeltable

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_pixeltable-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_pixeltable-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a63403ef657eb4724290b826943615248b5537250f1117bb7938401e364453c9
MD5 9a0cba345c6122caec3379629e69a069
BLAKE2b-256 4b2b1e337252a55a2502579680776ef8091cd834947f80bc776652b1bd73bed4

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_pixeltable-0.1.2-py3-none-any.whl:

Publisher: release.yml on pixeltable/langchain-pixeltable

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page