Skip to main content

Django-hawkeye full-text search using PostgreSQL pg_textsearch - a lightweight Elasticsearch alternative

Project description

django-hawkeye 🎯

Django BM25 full-text search using PostgreSQL pg_textsearch - a lightweight Elasticsearch alternative.

Features

  • Simple API - Just add a mixin and search with Article.search("query")
  • BM25 ranking - Industry-standard relevance scoring (same as Elasticsearch)
  • No external services - Uses PostgreSQL 17+ native search
  • RAG-ready - Use as the retrieval layer for Retrieval Augmented Generation

Requirements

  • PostgreSQL 17+
  • pg_textsearch extension
  • Django 4.2+
  • Python 3.10+

Installation

pip install django-hawkeye

PostgreSQL Extension Setup

This library requires the pg_textsearch extension installed on your PostgreSQL server:

# Install build dependencies
apt-get install build-essential git postgresql-server-dev-17

# Clone and build
git clone https://github.com/timescale/pg_textsearch.git
cd pg_textsearch
make && make install

The extension is automatically enabled via Django migrations when you run python manage.py migrate.

See the pg_textsearch repository for detailed installation instructions.

Add to INSTALLED_APPS:

INSTALLED_APPS = [
    ...
    'django_hawkeye',
]

Quick Start

1. Define your model

from django.db import models
from django_hawkeye import BM25Index, BM25Searchable

class Article(BM25Searchable, models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()

    class Meta:
        indexes = [
            BM25Index(fields=['content'], name='article_bm25_idx'),
        ]

2. Run migrations

python manage.py makemigrations
python manage.py migrate

3. Search

# Basic search
Article.search("django tutorial")

# With filters
Article.search("web framework").filter(published=True)[:10]

# With score threshold (lower = better match)
Article.search("django").filter(bm25_score__lt=-1.0)

API

BM25Searchable Mixin

Add to any model to enable .search() method:

class Article(BM25Searchable, models.Model):
    ...

BM25Index

BM25Index(
    fields=['content'],
    name='article_bm25_idx',
    text_config='english',  # PostgreSQL text search config
    k1=1.2,                 # Term frequency saturation (0.1-10.0)
    b=0.75,                 # Length normalization (0.0-1.0)
)

Search Methods

# Basic search - returns BM25SearchQuerySet
Article.search("query")

# Chainable with Django QuerySet methods
Article.search("query").filter(author="John")
Article.search("query").exclude(draft=True)
Article.search("query").select_related('author')
Article.search("query")[:10]  # Limit results

# Filter by score threshold
Article.search("query").filter(bm25_score__lt=-1.0)

Advanced Usage

Override search() method

class Article(BM25Searchable, models.Model):
    title = models.CharField(max_length=255)
    content = models.TextField()

    class Meta:
        indexes = [
            BM25Index(fields=['content'], name='article_bm25_idx'),
        ]

    @classmethod
    def search(cls, query, include_title=False):
        """Custom search with optional title filtering."""
        results = super().search(query)
        if include_title:
            results = results.filter(title__icontains=query)
        return results

Direct Expression API

Use BM25Score for full control:

from django_hawkeye import BM25Score

# Manual annotation
Article.objects.annotate(
    score=BM25Score('content', 'search query', index_name='article_bm25_idx')
).order_by('score')

# Multi-field weighted search
from django.db.models import F

Article.objects.annotate(
    title_score=BM25Score('title', query, index_name='title_idx'),
    content_score=BM25Score('content', query, index_name='content_idx'),
).annotate(
    combined=F('title_score') * 2 + F('content_score')
).order_by('combined')

Without Mixin

from django_hawkeye import BM25Index, BM25Score

class Article(models.Model):
    content = models.TextField()

    class Meta:
        indexes = [
            BM25Index(fields=['content'], name='article_bm25_idx'),
        ]

    @classmethod
    def search(cls, query):
        return cls.objects.annotate(
            score=BM25Score('content', query, index_name='article_bm25_idx')
        ).filter(score__lt=0).order_by('score')

Score Semantics

pg_textsearch returns NEGATIVE scores. Lower values = better match.

# Correct - ascending order (best matches first)
Article.search("query")  # Already ordered correctly

# Manual ordering
.order_by('bm25_score')  # ✓ Correct
.order_by('-bm25_score') # ✗ Wrong - worst matches first

Why Hawkeye?

Feature Elasticsearch django-hawkeye
Infrastructure Separate cluster Your PostgreSQL
Sync Manual index sync Automatic (native)
Cost $$$ Free
Setup Complex Add mixin + migrate
BM25 ranking

License

MIT

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_hawkeye-0.2.0.tar.gz (65.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

django_hawkeye-0.2.0-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file django_hawkeye-0.2.0.tar.gz.

File metadata

  • Download URL: django_hawkeye-0.2.0.tar.gz
  • Upload date:
  • Size: 65.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_hawkeye-0.2.0.tar.gz
Algorithm Hash digest
SHA256 97807205400561c05554e019dc9f42313b0888a48b37e99fb7624eca33476985
MD5 1ba3e04e0c6dbc3adc8d4b1469428977
BLAKE2b-256 c86e8ab3bdb28cd6e011c5b46b4352cf2fe9ab7494a94de5dbeaaec4e02e4c33

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_hawkeye-0.2.0.tar.gz:

Publisher: publish.yml on FarhanAliRaza/django-hawkeye

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file django_hawkeye-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: django_hawkeye-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for django_hawkeye-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ebba644688b399f77aac0b49c562c485ba6a0b4db92a399316e69fe9288a0150
MD5 572f19566b7639a08fa3fedbc5223e51
BLAKE2b-256 87f8f09d0c161dfc6f46f16769c21883769464277c02de5fe60f0dd4d5965075

See more details on using hashes here.

Provenance

The following attestation bundles were made for django_hawkeye-0.2.0-py3-none-any.whl:

Publisher: publish.yml on FarhanAliRaza/django-hawkeye

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page