Skip to main content

Typed SQLAlchemy helpers for ParadeDB

Project description

ParadeDB

Simple, Elastic-quality search for Postgres

WebsiteDocsCommunityBlogChangelog


sqlalchemy-paradedb

PyPI Python Versions Downloads Codecov License Slack URL X URL

ParadeDB integration for SQLAlchemy: typed helpers for BM25 indexes, search predicates, scoring, snippets, facets, and migration ergonomics.

Requirements & Compatibility

Component Supported
Python 3.10+
SQLAlchemy 2.0.32+
ParadeDB 0.22.0+
PostgreSQL 15+ (with ParadeDB extension)

Installation

uv add sqlalchemy-paradedb

For local development:

uv sync --extra test --extra dev

Quick Start

Prerequisites

Install pg_search in your Postgres database and connect SQLAlchemy to that database.

Create a BM25 Index

from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

products_bm25_idx = Index(
    "products_bm25_idx",
    indexing.BM25Field(Product.id),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.unicode(lowercase=True),
    ),
    indexing.BM25Field(
        Product.category,
        tokenizer=indexing.tokenize.literal(),
    ),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

For JSON columns named metadata, use metadata_ as the ORM attribute name.

Query with ParadeDB Predicates

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(Product.id, Product.description)
    .where(search.match_any(Product.description, "running", "shoes"))
    .order_by(pdb.score(Product.id).desc())
    .limit(10)
)

with Session(engine) as session:
    rows = session.execute(stmt).all()

Rows + Facets in a Single Query

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import facets, search

base = (
    select(Product.id, Product.description)
    .where(search.match_all(Product.description, "running"))
    .order_by(Product.id)
    .limit(10)
)

stmt = facets.with_rows(
    base,
    agg=facets.multi(
        facets.value_count(field="id"),
        facets.terms(field="category", size=10),
    ),
    key_field=Product.id,
)

with Session(engine) as session:
    rows = session.execute(stmt).all()
    facet_payload = facets.extract(rows)

Search Patterns

Fuzzy Matching

from paradedb.sqlalchemy import search

search.term(Product.description, "shose", distance=1)
search.match_any(Product.description, "wirless", distance=1, prefix=True)
search.term(Product.description, "rnnuing", distance=1, transpose_cost_one=True)

Use fuzzy options on term, match_any, or match_all; there is no separate search.fuzzy(...) helper.

Phrase Prefix and More-Like-This

from paradedb.sqlalchemy import search

search.phrase_prefix(Product.description, ["running", "sh"])
search.more_like_this(Product.id, document_id=1, fields=["description"])

Proximity Composition

from sqlalchemy import select
from paradedb.sqlalchemy import search

prox = search.prox_array("running").within(1, search.prox_regex("sho.*"), ordered=True)
stmt = select(Product.id).where(search.proximity(Product.description, prox))

Indexing and Tokenizers

Tokenizer config can be expressed as a structured mapping:

from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

products_bm25_idx = Index(
    "products_bm25_idx",
    indexing.BM25Field(Product.id),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.from_config(
            {
                "tokenizer": "simple",
                "filters": ["lowercase", "stemmer"],
                "stemmer": "english",
                "alias": "description_simple",
            }
        ),
    ),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.from_config(
            {
                "tokenizer": "ngram",
                "args": [3, 8],
                "named_args": {"prefix_only": True},
                "alias": "description_ngram",
            }
        ),
    ),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

Validate that a field is indexed with the expected tokenizer:

from paradedb.sqlalchemy import indexing

indexing.assert_indexed(engine, Product.category, tokenizer="literal")

Inspect BM25 metadata for a mapped table:

from paradedb.sqlalchemy import indexing

meta = indexing.describe(engine, Product.__table__)

Alembic Operations

Import once in migration environment startup so Alembic registers ParadeDB operations:

import paradedb.sqlalchemy.alembic  # noqa: F401

Use custom operations in migrations:

op.create_bm25_index(
    "products_bm25_idx",
    "products",
    ["id", "description"],
    key_field="id",
    table_schema="public",
)
op.reindex_bm25("products_bm25_idx", concurrently=True, schema="public")
op.drop_bm25_index("products_bm25_idx", if_exists=True, schema="public")

op.reindex_bm25(..., concurrently=True) must run outside a transaction (autocommit block).

Diagnostics Helpers

paradedb.sqlalchemy.diagnostics exposes wrapper functions for ParadeDB diagnostics:

from paradedb.sqlalchemy import diagnostics

indexes = diagnostics.paradedb_indexes(engine)
segments = diagnostics.paradedb_index_segments(engine, "products_bm25_idx")
check = diagnostics.paradedb_verify_index(engine, "products_bm25_idx", sample_rate=0.1)
all_checks = diagnostics.paradedb_verify_all_indexes(engine, schema_pattern="public")

Common Errors

with_rows requires ORDER BY

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# Missing order_by(...)
base = select(Product.id).limit(10)
facets.with_rows(base, agg=facets.value_count(field="id"), key_field=Product.id)

with_rows requires LIMIT

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# Missing limit(...)
base = select(Product.id).order_by(Product.id)
facets.with_rows(base, agg=facets.value_count(field="id"), key_field=Product.id)

with_rows requires a ParadeDB predicate

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# ensure_predicate=False disables automatic search.all(...) injection
facets.with_rows(
    select(Product.id).order_by(Product.id).limit(10),
    agg=facets.value_count(field="id"),
    key_field=Product.id,
    ensure_predicate=False,
)

tokenizer config requires 'tokenizer'

from paradedb.sqlalchemy import indexing

indexing.tokenize.from_config({"filters": ["lowercase"]})

Examples

Documentation

Contributing

See CONTRIBUTING.md for development setup, running tests, linting, and the PR workflow.

Support

If you're missing a feature or have found a bug, please open a GitHub Issue.

To get community support, you can:

If you need commercial support, please contact the ParadeDB team.

License

sqlalchemy-paradedb is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlalchemy_paradedb-0.4.0.tar.gz (64.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlalchemy_paradedb-0.4.0-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file sqlalchemy_paradedb-0.4.0.tar.gz.

File metadata

  • Download URL: sqlalchemy_paradedb-0.4.0.tar.gz
  • Upload date:
  • Size: 64.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sqlalchemy_paradedb-0.4.0.tar.gz
Algorithm Hash digest
SHA256 11812b55bc7ea4e39ecb3c5658ce9528611412761e4fd219bdb895d483425dd7
MD5 d89e1a24d44db697caf564246fdff38d
BLAKE2b-256 520bc0dd924b2652f28adead7fbda56e54a158c5dbc321ba768ac010201953d1

See more details on using hashes here.

File details

Details for the file sqlalchemy_paradedb-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlalchemy_paradedb-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6fd806fadaf0da680e6f4ed2c310fbe572617d6dcaf76e8e1b089a87f4183ce2
MD5 4b503aa0615903b24b8d7499be7dcdba
BLAKE2b-256 738b40c7064204b72646e3ffb74dc495a8993f92f46445ffa7091f864b0932de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page