Django-hawkeye full-text search using PostgreSQL pg_textsearch - a lightweight Elasticsearch alternative
Project description
django-hawkeye 🎯
Django BM25 full-text search using PostgreSQL pg_textsearch - a lightweight Elasticsearch alternative.
Features
- Simple API - Just add a mixin and search with
Article.search("query") - BM25 ranking - Industry-standard relevance scoring (same as Elasticsearch)
- No external services - Uses PostgreSQL 17+ native search
- RAG-ready - Use as the retrieval layer for Retrieval Augmented Generation
Requirements
- PostgreSQL 17+
- pg_textsearch extension
- Django 4.2+
- Python 3.10+
Installation
pip install django-hawkeye
PostgreSQL Extension Setup
This library requires the pg_textsearch extension installed on your PostgreSQL server:
# Install build dependencies
apt-get install build-essential git postgresql-server-dev-17
# Clone and build
git clone https://github.com/timescale/pg_textsearch.git
cd pg_textsearch
make && make install
The extension is automatically enabled via Django migrations when you run python manage.py migrate.
See the pg_textsearch repository for detailed installation instructions.
Add to INSTALLED_APPS:
INSTALLED_APPS = [
...
'django_hawkeye',
]
Quick Start
1. Define your model
from django.db import models
from django_hawkeye import BM25Index, BM25Searchable
class Article(BM25Searchable, models.Model):
title = models.CharField(max_length=255)
content = models.TextField()
class Meta:
indexes = [
BM25Index(fields=['content'], name='article_bm25_idx'),
]
2. Run migrations
python manage.py makemigrations
python manage.py migrate
3. Search
# Basic search
Article.search("django tutorial")
# With filters
Article.search("web framework").filter(published=True)[:10]
# With score threshold (lower = better match)
Article.search("django").filter(bm25_score__lt=-1.0)
API
BM25Searchable Mixin
Add to any model to enable .search() method:
class Article(BM25Searchable, models.Model):
...
BM25Index
BM25Index(
fields=['content'],
name='article_bm25_idx',
text_config='english', # PostgreSQL text search config
k1=1.2, # Term frequency saturation (0.1-10.0)
b=0.75, # Length normalization (0.0-1.0)
)
Search Methods
# Basic search - returns BM25SearchQuerySet
Article.search("query")
# Chainable with Django QuerySet methods
Article.search("query").filter(author="John")
Article.search("query").exclude(draft=True)
Article.search("query").select_related('author')
Article.search("query")[:10] # Limit results
# Filter by score threshold
Article.search("query").filter(bm25_score__lt=-1.0)
Advanced Usage
Override search() method
class Article(BM25Searchable, models.Model):
title = models.CharField(max_length=255)
content = models.TextField()
class Meta:
indexes = [
BM25Index(fields=['content'], name='article_bm25_idx'),
]
@classmethod
def search(cls, query, include_title=False):
"""Custom search with optional title filtering."""
results = super().search(query)
if include_title:
results = results.filter(title__icontains=query)
return results
Direct Expression API
Use BM25Score for full control:
from django_hawkeye import BM25Score
# Manual annotation
Article.objects.annotate(
score=BM25Score('content', 'search query', index_name='article_bm25_idx')
).order_by('score')
# Multi-field weighted search
from django.db.models import F
Article.objects.annotate(
title_score=BM25Score('title', query, index_name='title_idx'),
content_score=BM25Score('content', query, index_name='content_idx'),
).annotate(
combined=F('title_score') * 2 + F('content_score')
).order_by('combined')
Without Mixin
from django_hawkeye import BM25Index, BM25Score
class Article(models.Model):
content = models.TextField()
class Meta:
indexes = [
BM25Index(fields=['content'], name='article_bm25_idx'),
]
@classmethod
def search(cls, query):
return cls.objects.annotate(
score=BM25Score('content', query, index_name='article_bm25_idx')
).filter(score__lt=0).order_by('score')
Score Semantics
pg_textsearch returns NEGATIVE scores. Lower values = better match.
# Correct - ascending order (best matches first)
Article.search("query") # Already ordered correctly
# Manual ordering
.order_by('bm25_score') # ✓ Correct
.order_by('-bm25_score') # ✗ Wrong - worst matches first
Why Hawkeye?
| Feature | Elasticsearch | django-hawkeye |
|---|---|---|
| Infrastructure | Separate cluster | Your PostgreSQL |
| Sync | Manual index sync | Automatic (native) |
| Cost | $$$ | Free |
| Setup | Complex | Add mixin + migrate |
| BM25 ranking | ✓ | ✓ |
License
MIT
Links
- pg_textsearch - The PostgreSQL extension
- BM25 Algorithm - How ranking works
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file django_hawkeye-0.1.0.tar.gz.
File metadata
- Download URL: django_hawkeye-0.1.0.tar.gz
- Upload date:
- Size: 63.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
544d5578cfc2ac30164843d54a60bebca543d16f33f6a2a037fc57aebad682c3
|
|
| MD5 |
7f9120dc01771e18e88b6bf1ffb84d4c
|
|
| BLAKE2b-256 |
ab258fda74b30fa6e173c8074cd64c4e246d37878aa6eebbff2fa4d9848a5bcc
|
Provenance
The following attestation bundles were made for django_hawkeye-0.1.0.tar.gz:
Publisher:
publish.yml on FarhanAliRaza/django-hawkeye
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_hawkeye-0.1.0.tar.gz -
Subject digest:
544d5578cfc2ac30164843d54a60bebca543d16f33f6a2a037fc57aebad682c3 - Sigstore transparency entry: 773441400
- Sigstore integration time:
-
Permalink:
FarhanAliRaza/django-hawkeye@dfc68280569408ade6b7fa99a27eb6220cbc080c -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/FarhanAliRaza
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfc68280569408ade6b7fa99a27eb6220cbc080c -
Trigger Event:
release
-
Statement type:
File details
Details for the file django_hawkeye-0.1.0-py3-none-any.whl.
File metadata
- Download URL: django_hawkeye-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ff1911dfacada0f1d46a5f9c8e0946f9c802ed8ba5401aff7d123e0cbb595206
|
|
| MD5 |
a4bc210e7ca06dc91e820111fad1e0a8
|
|
| BLAKE2b-256 |
8cc96ef95dfbf24a6c4849f373cc6a9b3b6ce380d9192e8a1862de518a99b9c4
|
Provenance
The following attestation bundles were made for django_hawkeye-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on FarhanAliRaza/django-hawkeye
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
django_hawkeye-0.1.0-py3-none-any.whl -
Subject digest:
ff1911dfacada0f1d46a5f9c8e0946f9c802ed8ba5401aff7d123e0cbb595206 - Sigstore transparency entry: 773441405
- Sigstore integration time:
-
Permalink:
FarhanAliRaza/django-hawkeye@dfc68280569408ade6b7fa99a27eb6220cbc080c -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/FarhanAliRaza
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dfc68280569408ade6b7fa99a27eb6220cbc080c -
Trigger Event:
release
-
Statement type: