Skip to main content

Django-hawkeye full-text search using PostgreSQL pg_textsearch - a lightweight Elasticsearch alternative

Project description

django-hawkeye 🎯

Django BM25 full-text search using PostgreSQL pg_textsearch - a lightweight Elasticsearch alternative.

Features

  • Simple API - Just add a mixin and search with Article.search("query")
  • BM25 ranking - Industry-standard relevance scoring (same as Elasticsearch)
  • No external services - Uses PostgreSQL 17+ native search
  • RAG-ready - Use as the retrieval layer for Retrieval Augmented Generation

Requirements

  • PostgreSQL 17+
  • pg_textsearch extension
  • Django 4.2+
  • Python 3.10+

Installation

pip install django-hawkeye

PostgreSQL Extension Setup

This library requires the pg_textsearch extension installed on your PostgreSQL server:

# Install build dependencies
apt-get install build-essential git postgresql-server-dev-17

# Clone and build
git clone https://github.com/timescale/pg_textsearch.git
cd pg_textsearch
make && make install

The extension is automatically enabled via Django migrations when you run python manage.py migrate.

See the pg_textsearch repository for detailed installation instructions.

Add to INSTALLED_APPS:

INSTALLED_APPS = [
    ...
    'django_hawkeye',
]

Quick Start

1. Define your model

from django.db import models
from django_hawkeye import BM25Index, BM25Searchable

class Article(BM25Searchable, models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()

    class Meta:
        indexes = [
            BM25Index(fields=['content'], name='article_bm25_idx'),
        ]

2. Run migrations

python manage.py makemigrations
python manage.py migrate

3. Search

# Basic search
Article.search("django tutorial")

# With filters
Article.search("web framework").filter(published=True)[:10]

# With score threshold (lower = better match)
Article.search("django").filter(bm25_score__lt=-1.0)

API

BM25Searchable Mixin

Add to any model to enable .search() method:

class Article(BM25Searchable, models.Model):
    ...

BM25Index

BM25Index(
    fields=['content'],
    name='article_bm25_idx',
    text_config='english',  # PostgreSQL text search config
    k1=1.2,                 # Term frequency saturation (0.1-10.0)
    b=0.75,                 # Length normalization (0.0-1.0)
)

Search Methods

# Basic search - returns BM25SearchQuerySet
Article.search("query")

# Chainable with Django QuerySet methods
Article.search("query").filter(author="John")
Article.search("query").exclude(draft=True)
Article.search("query").select_related('author')
Article.search("query")[:10]  # Limit results

# Filter by score threshold
Article.search("query").filter(bm25_score__lt=-1.0)

Advanced Usage

Override search() method

class Article(BM25Searchable, models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()

    class Meta:
        indexes = [
            BM25Index(fields=['content'], name='article_bm25_idx'),
        ]

    @classmethod
    def search(cls, query, include_title=False):
        """Custom search with optional title filtering."""
        results = super().search(query)
        if include_title:
            results = results.filter(title__icontains=query)
        return results

Direct Expression API

Use BM25Score for full control:

from django_hawkeye import BM25Score

# Manual annotation
Article.objects.annotate(
    score=BM25Score('content', 'search query', index_name='article_bm25_idx')
).order_by('score')

# Multi-field weighted search
from django.db.models import F

Article.objects.annotate(
    title_score=BM25Score('title', query, index_name='title_idx'),
    content_score=BM25Score('content', query, index_name='content_idx'),
).annotate(
    combined=F('title_score') * 2 + F('content_score')
).order_by('combined')

Without Mixin

from django_hawkeye import BM25Index, BM25Score

class Article(models.Model):
    content = models.TextField()

    class Meta:
        indexes = [
            BM25Index(fields=['content'], name='article_bm25_idx'),
        ]

    @classmethod
    def search(cls, query):
        return cls.objects.annotate(
            score=BM25Score('content', query, index_name='article_bm25_idx')
        ).filter(score__lt=0).order_by('score')

Score Semantics

pg_textsearch returns NEGATIVE scores. Lower values = better match.

# Correct - ascending order (best matches first)
Article.search("query")  # Already ordered correctly

# Manual ordering
.order_by('bm25_score')  # ✓ Correct
.order_by('-bm25_score') # ✗ Wrong - worst matches first

Why Hawkeye?

Feature Elasticsearch django-hawkeye
Infrastructure Separate cluster Your PostgreSQL
Sync Manual index sync Automatic (native)
Cost $$$ Free
Setup Complex Add mixin + migrate
BM25 ranking

License

MIT

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_hawkeye-0.1.0.tar.gz (63.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

django_hawkeye-0.1.0-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file django_hawkeye-0.1.0.tar.gz.

File metadata

  • Download URL: django_hawkeye-0.1.0.tar.gz
  • Upload date:
  • Size: 63.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_hawkeye-0.1.0.tar.gz
Algorithm Hash digest
SHA256 544d5578cfc2ac30164843d54a60bebca543d16f33f6a2a037fc57aebad682c3
MD5 7f9120dc01771e18e88b6bf1ffb84d4c
BLAKE2b-256 ab258fda74b30fa6e173c8074cd64c4e246d37878aa6eebbff2fa4d9848a5bcc

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_hawkeye-0.1.0.tar.gz:

Publisher: publish.yml on FarhanAliRaza/django-hawkeye

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file django_hawkeye-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: django_hawkeye-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_hawkeye-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ff1911dfacada0f1d46a5f9c8e0946f9c802ed8ba5401aff7d123e0cbb595206
MD5 a4bc210e7ca06dc91e820111fad1e0a8
BLAKE2b-256 8cc96ef95dfbf24a6c4849f373cc6a9b3b6ce380d9192e8a1862de518a99b9c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_hawkeye-0.1.0-py3-none-any.whl:

Publisher: publish.yml on FarhanAliRaza/django-hawkeye

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page