Skip to main content

Typed SQLAlchemy helpers for ParadeDB

Project description

ParadeDB

Simple, Elastic-quality search for Postgres

WebsiteDocsCommunityBlogChangelog


sqlalchemy-paradedb

PyPI Python Versions Downloads Codecov License Slack URL X URL

ParadeDB integration for SQLAlchemy: typed helpers for BM25 indexes, search predicates, scoring, snippets, facets, and migration ergonomics.

Requirements & Compatibility

Component Supported
Python 3.10+
SQLAlchemy 2.0.32+
ParadeDB 0.22.0+
PostgreSQL 15+ (with ParadeDB extension)

Installation

uv add sqlalchemy-paradedb

For local development:

uv sync --extra test --extra dev

Quick Start

Prerequisites

Install pg_search in your Postgres database and connect SQLAlchemy to that database.

Create a BM25 Index

from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

products_bm25_idx = Index(
    "products_bm25_idx",
    indexing.BM25Field(Product.id),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.unicode(lowercase=True),
    ),
    indexing.BM25Field(
        Product.category,
        tokenizer=indexing.tokenize.literal(),
    ),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

For JSON columns named metadata, use metadata_ as the ORM attribute name.

Query with ParadeDB Predicates

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import pdb, search

stmt = (
    select(Product.id, Product.description)
    .where(search.match_any(Product.description, "running", "shoes"))
    .order_by(pdb.score(Product.id).desc())
    .limit(10)
)

with Session(engine) as session:
    rows = session.execute(stmt).all()

Rows + Facets in a Single Query

from sqlalchemy import select
from sqlalchemy.orm import Session
from paradedb.sqlalchemy import facets, search

base = (
    select(Product.id, Product.description)
    .where(search.match_all(Product.description, "running"))
    .order_by(Product.id)
    .limit(10)
)

stmt = facets.with_rows(
    base,
    agg=facets.multi(
        facets.value_count(field="id"),
        facets.terms(field="category", size=10),
    ),
    key_field=Product.id,
)

with Session(engine) as session:
    rows = session.execute(stmt).all()
    facet_payload = facets.extract(rows)

Search Patterns

Fuzzy Matching

from paradedb.sqlalchemy import search

search.term(Product.description, "shose", distance=1)
search.match_any(Product.description, "wirless", distance=1, prefix=True)
search.term(Product.description, "rnnuing", distance=1, transpose_cost_one=True)

Use fuzzy options on term, match_any, or match_all; there is no separate search.fuzzy(...) helper.

Phrase Prefix and More-Like-This

from paradedb.sqlalchemy import search

search.phrase_prefix(Product.description, ["running", "sh"])
search.more_like_this(Product.id, document_id=1, fields=["description"])

Proximity Composition

from sqlalchemy import select
from paradedb.sqlalchemy import search

prox = search.prox_array("running").within(1, search.prox_regex("sho.*"), ordered=True)
stmt = select(Product.id).where(search.proximity(Product.description, prox))

Indexing and Tokenizers

Tokenizer config can be expressed as a structured mapping:

from sqlalchemy import Index
from paradedb.sqlalchemy import indexing

products_bm25_idx = Index(
    "products_bm25_idx",
    indexing.BM25Field(Product.id),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.from_config(
            {
                "tokenizer": "simple",
                "filters": ["lowercase", "stemmer"],
                "stemmer": "english",
                "alias": "description_simple",
            }
        ),
    ),
    indexing.BM25Field(
        Product.description,
        tokenizer=indexing.tokenize.from_config(
            {
                "tokenizer": "ngram",
                "args": [3, 8],
                "named_args": {"prefix_only": True},
                "alias": "description_ngram",
            }
        ),
    ),
    postgresql_using="bm25",
    postgresql_with={"key_field": "id"},
)

Validate that a field is indexed with the expected tokenizer:

from paradedb.sqlalchemy import indexing

indexing.assert_indexed(engine, Product.category, tokenizer="literal")

Inspect BM25 metadata for a mapped table:

from paradedb.sqlalchemy import indexing

meta = indexing.describe(engine, Product.__table__)

Alembic Operations

Import once in migration environment startup so Alembic registers ParadeDB operations:

import paradedb.sqlalchemy.alembic  # noqa: F401

Use custom operations in migrations:

op.create_bm25_index(
    "products_bm25_idx",
    "products",
    ["id", "description"],
    key_field="id",
    table_schema="public",
)
op.reindex_bm25("products_bm25_idx", concurrently=True, schema="public")
op.drop_bm25_index("products_bm25_idx", if_exists=True, schema="public")

op.reindex_bm25(..., concurrently=True) must run outside a transaction (autocommit block).

Diagnostics Helpers

paradedb.sqlalchemy.diagnostics exposes wrapper functions for ParadeDB diagnostics:

from paradedb.sqlalchemy import diagnostics

indexes = diagnostics.paradedb_indexes(engine)
segments = diagnostics.paradedb_index_segments(engine, "products_bm25_idx")
check = diagnostics.paradedb_verify_index(engine, "products_bm25_idx", sample_rate=0.1)
all_checks = diagnostics.paradedb_verify_all_indexes(engine, schema_pattern="public")

Common Errors

with_rows requires ORDER BY

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# Missing order_by(...)
base = select(Product.id).limit(10)
facets.with_rows(base, agg=facets.value_count(field="id"), key_field=Product.id)

with_rows requires LIMIT

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# Missing limit(...)
base = select(Product.id).order_by(Product.id)
facets.with_rows(base, agg=facets.value_count(field="id"), key_field=Product.id)

with_rows requires a ParadeDB predicate

from sqlalchemy import select
from paradedb.sqlalchemy import facets

# ensure_predicate=False disables automatic search.all(...) injection
facets.with_rows(
    select(Product.id).order_by(Product.id).limit(10),
    agg=facets.value_count(field="id"),
    key_field=Product.id,
    ensure_predicate=False,
)

tokenizer config requires 'tokenizer'

from paradedb.sqlalchemy import indexing

indexing.tokenize.from_config({"filters": ["lowercase"]})

Examples

Documentation

Contributing

See CONTRIBUTING.md for development setup, running tests, linting, and the PR workflow.

Support

If you're missing a feature or have found a bug, please open a GitHub Issue.

To get community support, you can:

If you need commercial support, please contact the ParadeDB team.

License

sqlalchemy-paradedb is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sqlalchemy_paradedb-0.5.0.tar.gz (64.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sqlalchemy_paradedb-0.5.0-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file sqlalchemy_paradedb-0.5.0.tar.gz.

File metadata

  • Download URL: sqlalchemy_paradedb-0.5.0.tar.gz
  • Upload date:
  • Size: 64.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for sqlalchemy_paradedb-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c1ae1a770f4c3f74c1fe048c5da2f0bf1894a6505692d0e0c23c8b81229f9ea9
MD5 dfbc5caea3d048474b5efc38ce925713
BLAKE2b-256 441f7f393d6960fc895c4b822893785466dedbdf6d270dbff3e8f12f355b6326

See more details on using hashes here.

File details

Details for the file sqlalchemy_paradedb-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for sqlalchemy_paradedb-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d18bc09a2a64b5b64d07957dd05dcb0ead0e36667660813f0e0149263104ad52
MD5 a925fefc8b577601955cd1ec699c346a
BLAKE2b-256 6370c01412a2f060a56aab8201a69022ed56689e74adeeb44e3aa0143f5689bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page