ParadeDB integration for Django

These details have not been verified by PyPI

Project links

Project description

django-paradedb

ParadeDB — simple, Elastic-quality search for Postgres — integration for Django ORM.

Requirements & Compatibility

Component	Supported
Python	3.10, 3.11, 3.12, 3.13
Django	5.2, 6.0
ParadeDB	0.21.*
PostgreSQL	17, 18 (with ParadeDB extension)

Installation

pip install django-paradedb

Quick Start

Add a BM25 index to your model

from django.db import models
from paradedb.indexes import BM25Index
from paradedb.queryset import ParadeDBManager

class Product(models.Model):
    description = models.TextField()
    category = models.CharField(max_length=100)
    rating = models.IntegerField(default=0)

    objects = ParadeDBManager()

    class Meta:
        indexes = [
            BM25Index(
                fields={
                    'id': {},
                    'description': {'tokenizer': 'unicode_words'},
                    'category': {'tokenizer': 'literal'},  # exact match, no tokenization
                    'rating': {},
                },
                key_field='id',
                name='product_search_idx',
            ),
        ]

Search with a simple query

from paradedb.search import ParadeDB

Product.objects.filter(description=ParadeDB('shoes'))

Check out some examples:

BM25 Index

Define a BM25 index on your model fields. For more advanced indexing options like JSON indexing or indexing expressions, see the ParadeDB Indexing Documentation.

from paradedb.indexes import BM25Index

class Meta:
    indexes = [
        BM25Index(
            fields={
                'id': {},
                'title': {'tokenizer': 'unicode_words'},
                'body': {'tokenizer': 'unicode_words', 'stemmer': 'English'},
                'category': {'tokenizer': 'literal'},
            },
            key_field='id',
            name='article_idx',
        ),
    ]

For a full list of supported tokenizers and their configurations, please refer to the ParadeDB Tokenizer Documentation.

Note: If no tokenizer is specified for a field (e.g. 'rating': {}), ParadeDB applies its own default (unicode_words for text fields). If you provide filters or stemmer, you must also set an explicit tokenizer — omitting it will raise a ValueError.

'body': {
    'tokenizer': 'unicode_words',
    'stemmer': 'English',        # Stemming language
    'filters': ['lowercase'],    # Token filters
}

JSON Field Keys

Index specific keys within a JSONField

'metadata': {
    'json_keys': {
        'author': {'tokenizer': 'literal'},
        'tags': {'tokenizer': 'unicode_words'},
    }
}

Multiple Tokenizers Per Field

Index the same text field multiple ways by using tokenizers and aliases.

'description': {
    'tokenizers': [
        {'tokenizer': 'literal'},
        {'tokenizer': 'simple', 'alias': 'description_simple'},
    ],
}

Tokenizer Args and Named Args

BM25Index now supports a thin-wrapper tokenizer DSL:

tokenizer: tokenizer name (for structured args/named args) or raw tokenizer SQL function string
args: positional tokenizer arguments (list)
named_args: named tokenizer/filter arguments (dict), rendered as key=value config
alias: optional alias for query-time targeting
filters / stemmer: legacy convenience keys (merged into named_args)

'description': {
    'tokenizers': [
        {'tokenizer': 'unicode_words'},
        {
            'tokenizer': 'ngram',
            'args': [3, 8],
            'named_args': {'prefix_only': True, 'positions': True},
            'alias': 'description_ngram',
        },
        {
            'tokenizer': 'regex_pattern',
            'args': [r'(?i)\bshoe\w*'],
            'alias': 'description_regex',
        },
    ]
}

Common positional-argument tokenizers from ParadeDB docs:

ngram(min_gram, max_gram, ...)
regex_pattern(pattern, ...)
lindera(dictionary, ...)

Common named args from ParadeDB docs:

tokenizer options: prefix_only, positions, remove_emojis, lowercase, alias
token filter options: remove_long, remove_short, stopwords, stemmer

Query a specific tokenizer alias when needed:

SELECT *
FROM products
WHERE (description::pdb.alias('description_simple')) ||| 'running';

In Django ORM, use RawSQL for alias-targeted queries:

from django.db.models.expressions import RawSQL

queryset = Product.objects.filter(
    RawSQL("(description::pdb.alias('description_simple')) ||| %s", ["running"])
)

Migrations

BM25Index works seamlessly with Django's migration system. You can add indexes to existing models or new models - Django will automatically generate and apply the necessary migrations.

Adding an index to an existing model:

# Simply add BM25Index to your existing model's Meta.indexes
class Article(models.Model):
    title = models.TextField()
    body = models.TextField()

    class Meta:
        indexes = [
            BM25Index(
                fields={'id': {}, 'title': {}, 'body': {}},
                key_field='id',
                name='article_idx',
            ),
        ]

Then run Django's standard migration commands:

python manage.py makemigrations
python manage.py migrate

Modifying an existing index:

To change index configuration (e.g., tokenizer settings), remove the old index and add a new one with a different name. Django will drop and recreate the index during migration.

Important notes:

The table can contain existing data when adding a BM25Index - the index will be built from the existing rows
Index creation may take time on large tables (millions of rows)
Django automatically handles index cleanup when reverting migrations

Query Types

For a full list of supported query types and advanced options, please refer to the ParadeDB Query Builder Documentation.

Basic Search

Simple full-text search with &&& (AND) operator

from paradedb.search import ParadeDB

# Single term
Product.objects.filter(description=ParadeDB('shoes'))

# Multiple terms (AND)
Product.objects.filter(description=ParadeDB('running', 'shoes'))

Boolean Composition with PQ

ParadeDB provides two ways to perform AND operations:

Simple AND - Multiple Terms (Recommended for most cases)

from paradedb.search import ParadeDB

# Simple syntax - terms are automatically combined with AND
Product.objects.filter(description=ParadeDB('running', 'shoes'))
# SQL: WHERE description &&& ARRAY['running', 'shoes']

Use this when: You have a simple list of terms that must all match.

Explicit PQ Objects - Complex Boolean Logic

from paradedb.search import ParadeDB, PQ

# OR query - find documents matching ANY term
Product.objects.filter(description=ParadeDB(PQ('shoes') | PQ('boots')))
# SQL: WHERE description ||| ARRAY['shoes', 'boots']

# AND query - explicit boolean combination
Product.objects.filter(description=ParadeDB(PQ('running') & PQ('shoes')))
# SQL: WHERE description &&& ARRAY['running', 'shoes']

# Combine multiple terms with OR
Product.objects.filter(
    description=ParadeDB(PQ('shoes') | PQ('boots') | PQ('sandals'))
)

Use PQ when:

You need OR logic (must use PQ with |)
You're building dynamic queries where the operator might vary
You want explicit control over boolean operators

Combining with Django Q Objects

Mix ParadeDB search with Django's Q objects for complex filtering:

from django.db.models import Q
from paradedb.search import ParadeDB, PQ

# (ParadeDB search AND standard filter) OR (different search AND filter)
Product.objects.filter(
    Q(description=ParadeDB('running', 'shoes'), rating__gte=4) |
    Q(description=ParadeDB(PQ('boots') | PQ('sandals')), in_stock=True)
)

Note: The simple comma-separated syntax ParadeDB('a', 'b') is equivalent to ParadeDB(PQ('a') & PQ('b')) but more concise. Use the simple syntax unless you need OR operations or explicit boolean control.

Phrase Search

Match exact phrases with optional slop (word distance)

from paradedb.search import ParadeDB, Phrase

# Exact phrase
Product.objects.filter(description=ParadeDB(Phrase('running shoes')))

# Phrase with slop (allow up to 2 words between)
Product.objects.filter(description=ParadeDB(Phrase('running shoes', slop=2)))

Fuzzy Search

Match terms with typo tolerance (Levenshtein distance)

from paradedb.search import ParadeDB, Fuzzy

# Fuzzy match with distance 1 (default)
Product.objects.filter(description=ParadeDB(Fuzzy('shoez')))

# Fuzzy match with distance 2 (max)
Product.objects.filter(description=ParadeDB(Fuzzy('runing', distance=2)))

Term Query

Match exact terms without tokenization

from paradedb.search import ParadeDB, Term

Product.objects.filter(category=ParadeDB(Term('electronics')))

Regex Query

Match terms using a regular expression

from paradedb.search import ParadeDB, Regex

Product.objects.filter(description=ParadeDB(Regex('run.*')))

Match All

Return all documents (useful with facets)

from paradedb.search import ParadeDB, All

Product.objects.filter(id=ParadeDB(All()))

More Like This

Find similar documents based on term frequency analysis.

Note: Unlike other search expressions, MoreLikeThis is a filter Expression (not a lookup), so it's used directly in .filter() without wrapping in ParadeDB(). This is because MLT operates on the entire indexed document (typically multiple fields) rather than a single field.

from paradedb.search import MoreLikeThis

# Similar to a specific document by ID
Product.objects.filter(MoreLikeThis(product_id=42))

# Similar to multiple documents
Product.objects.filter(MoreLikeThis(product_ids=[1, 2, 3]))

# Similar to a custom document
Product.objects.filter(
    MoreLikeThis(document={"description": "comfortable running shoes"})
)

# With tuning parameters
Product.objects.filter(
    MoreLikeThis(
        product_id=42,
        min_term_freq=2,
        max_query_terms=25,
        min_doc_freq=5,
    )
)

Combining with other filters:

Since MoreLikeThis is an Expression, it composes naturally with Django's ORM:

from django.db.models import Q

# Combine with standard filters
Product.objects.filter(
    MoreLikeThis(product_id=42),
    in_stock=True,
    rating__gte=4
)

# Use with Q objects for complex logic
Product.objects.filter(
    Q(MoreLikeThis(product_id=42)) | Q(category='featured')
)

# Chain with other querysets
Product.objects.filter(
    MoreLikeThis(product_id=42)
).exclude(
    id=42  # Exclude the source document itself
).order_by('-rating')[:10]

Annotations

BM25 Score

Get the relevance score for each result. For more information on how scores are calculated, see BM25 Scoring.

from paradedb.functions import Score

Product.objects.filter(
    description=ParadeDB('shoes')
).annotate(
    score=Score()
).order_by('-score')

Snippet

Get highlighted text snippets. For more details on snippet configuration, see Highlighting.

from paradedb.functions import Snippet

Product.objects.filter(
    description=ParadeDB('shoes')
).annotate(
    highlight=Snippet('description', start_sel='<b>', stop_sel='</b>')
)

Snippet options:

Option	Description
`start_sel`	Opening highlight tag
`stop_sel`	Closing highlight tag
`max_num_chars`	Maximum snippet length

Faceted Search

For a full list of supported aggregations and advanced options, please refer to the ParadeDB Aggregations Documentation.

Requirements

The .facets() method has specific requirements based on how you use it:

When using include_rows=True (default):

✅ MUST have a ParadeDB search filter (e.g., ParadeDB() or MoreLikeThis())
✅ MUST call .order_by() on the queryset
✅ MUST slice the queryset (e.g., [:10])

When using include_rows=False:

✅ MUST have a ParadeDB search filter
❌ No ordering or slicing required

Why these requirements?

ParadeDB's aggregation uses window functions (pdb.agg() OVER ()) which require ordered, limited result sets when combined with row data. Without ordering and limits, PostgreSQL cannot efficiently compute the aggregations.

Basic Usage

Get aggregated counts alongside results

from paradedb.search import ParadeDB

# ✅ Correct: Has filter, ordering, and limit
rows, facets = (
    Product.objects.filter(description=ParadeDB('shoes'))
    .order_by('id')[:10]  # REQUIRED when include_rows=True
    .facets('category')
)
# facets = {'buckets': [{'key': 'footwear', 'doc_count': 5}, ...]}

# ❌ This will raise ValueError
rows, facets = (
    Product.objects.filter(description=ParadeDB('shoes'))
    .facets('category')  # Missing order_by() and slice!
)
# ValueError: facets(include_rows=True) requires order_by() and a LIMIT.

Facets-only (no rows)

# ✅ No ordering/limit needed when include_rows=False
facets = (
    Product.objects.filter(description=ParadeDB('shoes'))
    .facets('category', include_rows=False)
)

Multiple Facet Fields

rows, facets = (
    Product.objects.filter(description=ParadeDB('shoes'))
    .order_by('id')[:10]
    .facets('category', 'rating')
)
# facets = {'category_terms': {...}, 'rating_terms': {...}}

Facet Options

rows, facets = (
    Product.objects.filter(description=ParadeDB('shoes'))
    .order_by('rating')[:20]
    .facets(
        'category',
        size=20,           # Number of buckets (default: 10)
        order='-count',    # Sort order: count, -count, key, -key
        missing='Unknown', # Value for documents without the field
    )
)

Custom Aggregation JSON

rows, facets = (
    Product.objects.filter(description=ParadeDB('shoes'))
    .order_by('id')[:10]
    .facets(agg={'value_count': {'field': 'id'}})
)

Combining with Other QuerySet Methods

# Filter, annotate, order, limit, then facet
from paradedb.search import ParadeDB, Score

rows, facets = (
    Product.objects
    .filter(description=ParadeDB('running', 'shoes'), price__lt=100)
    .annotate(score=Score())
    .order_by('-score')[:20]
    .facets('category', 'brand')
)

# Works with prefetch_related
rows, facets = (
    Product.objects.filter(description=ParadeDB('shoes'))
    .prefetch_related('reviews')
    .order_by('id')[:10]
    .facets('category')
)

Common Errors and Solutions

Error: "facets() requires a ParadeDB operator in the WHERE clause"

# ❌ Missing ParadeDB filter
Product.objects.filter(price__lt=100).order_by('id')[:10].facets('category')

# ✅ Add a ParadeDB search filter
Product.objects.filter(
    price__lt=100,
    description=ParadeDB('shoes')  # Add this!
).order_by('id')[:10].facets('category')

Error: "facets(include_rows=True) requires order_by() and a LIMIT"

# ❌ Missing ordering
Product.objects.filter(description=ParadeDB('shoes'))[:10].facets('category')

# ❌ Missing limit
Product.objects.filter(description=ParadeDB('shoes')).order_by('id').facets('category')

# ✅ Both ordering and limit
Product.objects.filter(description=ParadeDB('shoes')).order_by('id')[:10].facets('category')

# ✅ Or use include_rows=False
Product.objects.filter(description=ParadeDB('shoes')).facets('category', include_rows=False)

Error: "Facet field names must be unique"

# ❌ Duplicate fields
.facets('category', 'category')

# ✅ Each field only once
.facets('category', 'brand')

Custom Manager

If you have a custom manager, compose it with ParadeDBQuerySet

from paradedb.queryset import ParadeDBQuerySet

class CustomManager(models.Manager):
    def active(self):
        return self.filter(is_active=True)

CustomManagerWithFacets = CustomManager.from_queryset(ParadeDBQuerySet)

class Product(models.Model):
    objects = CustomManagerWithFacets()

Django ORM Integration

Works seamlessly with Django's ORM features

from django.db.models import Q

# Combine with Q objects
Product.objects.filter(
    Q(description=ParadeDB('shoes')) & Q(rating__gte=4)
)

# Chain with standard filters
Product.objects.filter(
    description=ParadeDB('shoes')
).filter(
    category='footwear'
).exclude(
    rating__lt=3
)

# Select related
Product.objects.filter(
    description=ParadeDB('shoes')
).select_related('brand')

# Prefetch related
Product.objects.filter(
    description=ParadeDB('shoes')
).prefetch_related('reviews')

Security

SQL Injection Protection

django-paradedb uses SQL literal escaping for search terms rather than parameterized queries. This design choice is intentional and safe:

Escaping Strategy:

All user input is escaped using PostgreSQL's single-quote escaping (' → '')
Search terms are wrapped in SQL string literals: 'user input'
This prevents SQL injection while maintaining compatibility with ParadeDB's full-text operators

Implementation Details:

# All search terms are escaped via _quote_term()
def _quote_term(term: str) -> str:
    escaped = term.replace("'", "''")  # PostgreSQL standard escaping
    return f"'{escaped}'"

Which features use escaping:

ParadeDB() - All search terms (strings, Phrase, Fuzzy, Parse, Term, Regex)
Snippet() - HTML tag markers (start_sel, stop_sel)
Agg() - JSON aggregation specs

Which features use parameterization:

MoreLikeThis() - Uses %s placeholders for IDs, documents, and options
Standard Django filters - Use Django's native parameterization

Why literals instead of parameters?

ParadeDB's full-text operators (&&&, |||, ###, @@@) work with:

Single string literals: description &&& 'shoes'
Array literals: description &&& ARRAY['running', 'shoes']
Function calls with type casts: description ### 'exact phrase'::pdb.slop(2)

Parameterized queries would require PostgreSQL to parse the search syntax at execution time, which is incompatible with ParadeDB's operator design. The literal approach allows the query planner to optimize full-text searches effectively.

Safety Guarantee:

All escaping follows PostgreSQL's standard string literal rules. The implementation has been reviewed by Django Security Framework members and is protected by:

Comprehensive test coverage (103 tests including special character escaping)
Input validation at the ORM layer
PostgreSQL's built-in literal escaping semantics

Example - User Input is Safe:

# Even malicious input is safely escaped
user_query = "'; DROP TABLE products; --"
Product.objects.filter(description=ParadeDB(user_query))
# Generates: WHERE description &&& '''; DROP TABLE products; --'
# The query is escaped and treated as a literal search term

Documentation

Package Documentation: https://paradedb.github.io/django-paradedb
ParadeDB Official Docs: https://docs.paradedb.com
ParadeDB Website: https://paradedb.com

Development

Setup

# Install dev dependencies
pip install -e ".[dev]"

# Setup prek hooks
prek install

Testing

Unit tests verify individual components and logic without requiring a database connection.

Integration tests validate the full workflow against a real ParadeDB instance to ensure everything works end-to-end.

# Run unit tests only
pytest

# Run integration tests (requires Docker)
# This script automatically starts ParadeDB in Docker and runs the integration suite
bash scripts/run_integration_tests.sh

# Or manually start ParadeDB and run integration tests
bash scripts/run_paradedb.sh  # Starts ParadeDB container
export PARADEDB_INTEGRATION=1
export PARADEDB_TEST_DSN="postgresql://postgres:postgres@localhost:5432/postgres"
pytest -m integration

Linting & Type Checking

# Run linting
ruff check .
ruff format .

# Run type checking
mypy src/paradedb

For more details on contributing, development workflow, and PR conventions, see our Contributing Guide.

Support

If you're missing a feature or have found a bug, please open a GitHub Issue.

To get community support, you can:

Post a question in the ParadeDB Slack Community
Ask for help on our GitHub Discussions

If you need commercial support, please contact the ParadeDB team.

Acknowledgments

We would like to thank the following members of the Django community for their valuable feedback and reviews during the development of this package:

Timothy Allen - Principal Engineer at The Wharton School, PSF and DSF member
Frank Wiles - President & Founder of REVSYS

License

django-paradedb is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.0

Apr 21, 2026

0.5.0

Mar 23, 2026

0.4.0

Mar 3, 2026

0.3.0

Feb 19, 2026

This version

0.2.0

Feb 13, 2026

0.1.1

Feb 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_paradedb-0.2.0.tar.gz (153.3 kB view details)

Uploaded Feb 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

django_paradedb-0.2.0-py3-none-any.whl (22.0 kB view details)

Uploaded Feb 13, 2026 Python 3

File details

Details for the file django_paradedb-0.2.0.tar.gz.

File metadata

Download URL: django_paradedb-0.2.0.tar.gz
Upload date: Feb 13, 2026
Size: 153.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for django_paradedb-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`13dd13529f6386b0135435f2cd6827a4ac79b917d6ecb40580837cc5dc50ce94`
MD5	`c1b48a1cb4a0371890cde1524de4210e`
BLAKE2b-256	`3e661da51b545990223246566adfac2885aef54b89c5da4132f205a758a909bc`

See more details on using hashes here.

File details

Details for the file django_paradedb-0.2.0-py3-none-any.whl.

File metadata

Download URL: django_paradedb-0.2.0-py3-none-any.whl
Upload date: Feb 13, 2026
Size: 22.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for django_paradedb-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bb3b4bcd6fb504ebefa642f861b9297a7a6a3d4f783969a2bb6c66e20107bd0d`
MD5	`207117f34909dd6d09b2b0d5e55a66f9`
BLAKE2b-256	`e9acca8c941f4bd6b44a33ab6dd1da118ebd6bde9ca544d56f3d0d3ff87dd16f`

See more details on using hashes here.

django-paradedb 0.2.0

Navigation

Verified details

Owner

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

django-paradedb

Requirements & Compatibility

Installation

Quick Start

BM25 Index

JSON Field Keys

Multiple Tokenizers Per Field

Tokenizer Args and Named Args

Migrations

Query Types

Basic Search

Boolean Composition with PQ

Simple AND - Multiple Terms (Recommended for most cases)

Explicit PQ Objects - Complex Boolean Logic

Combining with Django Q Objects

Phrase Search

Fuzzy Search

Term Query

Regex Query

Match All

More Like This

Annotations

BM25 Score

Snippet

Faceted Search

Requirements

Basic Usage

Multiple Facet Fields

Facet Options

Custom Aggregation JSON

Combining with Other QuerySet Methods

Common Errors and Solutions

Error: "facets() requires a ParadeDB operator in the WHERE clause"

Error: "facets(include_rows=True) requires order_by() and a LIMIT"

Error: "Facet field names must be unique"

Custom Manager

Django ORM Integration

Security

SQL Injection Protection

Documentation

Development

Setup

Testing

Linting & Type Checking

Support

Acknowledgments

License

Project details

Verified details

Owner

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes