Skip to main content

Typed SQLAlchemy helpers for ParadeDB

Project description

ParadeDB

Simple, Elastic-quality search for Postgres

WebsiteDocsCommunityBlogChangelog


sqlalchemy-paradedb

PyPI Python Versions Downloads Codecov License Slack URL X URL

ParadeDB integration for SQLAlchemy: typed helpers for BM25 indexes, search predicates, scoring, snippets, facets, and migration ergonomics.

Requirements & Compatibility

Component Supported
Python 3.10+
SQLAlchemy 2.0.32+
ParadeDB 0.22.0+
PostgreSQL 15+ (with ParadeDB extension)

Installation

uv add sqlalchemy-paradedb

For local development:

uv sync --extra test --extra dev

Quick Start

Prerequisites

Install pg_search in your Postgres database and connect SQLAlchemy to that database.

Create a BM25 Index

from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

products_bm25_idx = Index(
    "products_bm25_idx",
    indexing.BM25Field(Product.id),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.unicode(lowercase=True),
    ),
    indexing.BM25Field(
        Product.category,
        tokenizer=indexing.tokenize.literal(),
    ),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

For JSON columns named metadata, use metadata_ as the ORM attribute name.

Query with ParadeDB Predicates

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(Product.id, Product.description)
    .where(search.match_any(Product.description, "running", "shoes"))
    .order_by(pdb.score(Product.id).desc())
    .limit(10)
)

with Session(engine) as session:
    rows = session.execute(stmt).all()

Rows + Facets in a Single Query

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import facets, search

base = (
    select(Product.id, Product.description)
    .where(search.match_all(Product.description, "running"))
    .order_by(Product.id)
    .limit(10)
)

stmt = facets.with_rows(
    base,
    agg=facets.multi(
        facets.value_count(field="id"),
        facets.terms(field="category", size=10),
    ),
    key_field=Product.id,
)

with Session(engine) as session:
    rows = session.execute(stmt).all()
    facet_payload = facets.extract(rows)

Search Patterns

Fuzzy Matching

from paradedb.sqlalchemy import search

search.term(Product.description, "shose", distance=1)
search.match_any(Product.description, "wirless", distance=1, prefix=True)
search.term(Product.description, "rnnuing", distance=1, transpose_cost_one=True)

Use fuzzy options on term, match_any, or match_all; there is no separate search.fuzzy(...) helper.

Phrase Prefix and More-Like-This

from paradedb.sqlalchemy import search

search.phrase_prefix(Product.description, ["running", "sh"])
search.more_like_this(Product.id, document_id=1, fields=["description"])

Proximity Composition

from sqlalchemy import select
from paradedb.sqlalchemy import search

prox = search.prox_array("running").within(1, search.prox_regex("sho.*"), ordered=True)
stmt = select(Product.id).where(search.proximity(Product.description, prox))

Indexing and Tokenizers

Tokenizer config can be expressed as a structured mapping:

from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

products_bm25_idx = Index(
    "products_bm25_idx",
    indexing.BM25Field(Product.id),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.from_config(
            {
                "tokenizer": "simple",
                "filters": ["lowercase", "stemmer"],
                "stemmer": "english",
                "alias": "description_simple",
            }
        ),
    ),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.from_config(
            {
                "tokenizer": "ngram",
                "args": [3, 8],
                "named_args": {"prefix_only": True},
                "alias": "description_ngram",
            }
        ),
    ),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

Validate that a field is indexed with the expected tokenizer:

from paradedb.sqlalchemy import indexing

indexing.assert_indexed(engine, Product.category, tokenizer="literal")

Inspect BM25 metadata for a mapped table:

from paradedb.sqlalchemy import indexing

meta = indexing.describe(engine, Product.__table__)

Alembic Operations

Import once in migration environment startup so Alembic registers ParadeDB operations:

import paradedb.sqlalchemy.alembic  # noqa: F401

Use custom operations in migrations:

op.create_bm25_index(
    "products_bm25_idx",
    "products",
    ["id", "description"],
    key_field="id",
    table_schema="public",
)
op.reindex_bm25("products_bm25_idx", concurrently=True, schema="public")
op.drop_bm25_index("products_bm25_idx", if_exists=True, schema="public")

op.reindex_bm25(..., concurrently=True) must run outside a transaction (autocommit block).

Diagnostics Helpers

paradedb.sqlalchemy.diagnostics exposes wrapper functions for ParadeDB diagnostics:

from paradedb.sqlalchemy import diagnostics

indexes = diagnostics.paradedb_indexes(engine)
segments = diagnostics.paradedb_index_segments(engine, "products_bm25_idx")
check = diagnostics.paradedb_verify_index(engine, "products_bm25_idx", sample_rate=0.1)
all_checks = diagnostics.paradedb_verify_all_indexes(engine, schema_pattern="public")

Common Errors

with_rows requires ORDER BY

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# Missing order_by(...)
base = select(Product.id).limit(10)
facets.with_rows(base, agg=facets.value_count(field="id"), key_field=Product.id)

with_rows requires LIMIT

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# Missing limit(...)
base = select(Product.id).order_by(Product.id)
facets.with_rows(base, agg=facets.value_count(field="id"), key_field=Product.id)

with_rows requires a ParadeDB predicate

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# ensure_predicate=False disables automatic search.all(...) injection
facets.with_rows(
    select(Product.id).order_by(Product.id).limit(10),
    agg=facets.value_count(field="id"),
    key_field=Product.id,
    ensure_predicate=False,
)

tokenizer config requires 'tokenizer'

from paradedb.sqlalchemy import indexing

indexing.tokenize.from_config({"filters": ["lowercase"]})

Examples

Documentation

Contributing

See CONTRIBUTING.md for development setup, running tests, linting, and the PR workflow.

Support

If you're missing a feature or have found a bug, please open a GitHub Issue.

To get community support, you can:

If you need commercial support, please contact the ParadeDB team.

License

sqlalchemy-paradedb is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlalchemy_paradedb-0.3.0.tar.gz (64.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlalchemy_paradedb-0.3.0-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file sqlalchemy_paradedb-0.3.0.tar.gz.

File metadata

  • Download URL: sqlalchemy_paradedb-0.3.0.tar.gz
  • Upload date:
  • Size: 64.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sqlalchemy_paradedb-0.3.0.tar.gz
Algorithm Hash digest
SHA256 87dd7a23b1d416c463e982095cc76daae7652bae1d97f1b04cb7449ad1e7496d
MD5 8ff95ac54af69528499cecd5f8bc7622
BLAKE2b-256 e44a92dc0b398b2fd8631e526d5322eb31c6039a3b88b233952fc3bb111ac5a8

See more details on using hashes here.

File details

Details for the file sqlalchemy_paradedb-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlalchemy_paradedb-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48f8ec30cef5ee373ac52f4c0c58c98f7124fa6da19480355628e7c69f2605f4
MD5 856c5c59d20ab5647dd76fcba3814d0d
BLAKE2b-256 c247676c395f442562959c48c12e624d1388c715c8d72f144ecb8b578a7ee3ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page