A simple search engine using native django database backend.
Project description
django-native-search
django-native-search implements basic full-text search engine for Django models.
The engine itself uses Django ORM to manage its index, so no additional backend is needed for searching to work. Just create a model for index, run makemigrations
and migrate
and you are ready to feed it with data and search.
Installation
Install the package from PyPi:
pip install django-native-search
The package will be installed with all its dependencies including django-expression-index
.
Setup
Setting up the search in basic configuration is quite simple.
1. Register the app
Add django_native_search
to INSTALLED_APPS
in your settings:
INSTALLED_APPS = [
...
'django_native_search.apps.DjangoNativeSearch',
...
]
2. Define your Index Model
Create a new app or in existing app, in your models.py
, define an index model. In this example we are creating a simple index for books.Book
model:
from django_native_search.models import IndexEntry
class BookIndexEntry(IndexEntry):
object = models.OneToOneField('books.Book', on_delete=models.CASCADE)
search_template='search_index/book.txt'
The object
field defines a relation to a model which is being indexed.
The engine uses search_template
to render the text with object
variable in template context.
By default the rendered text is tokenized with by re.searchall(r'[^\s"]+', text)
.
You can change this behavior by overriding tokenize
class method in your index model.
All extracted tokens are stored in the index of respective indexed model instance.
import re
class BookIndexEntry(IndexEntry):
...
@classmethod
def tokenize(self, text):
return re.findall(r"[^\W_]+(['_]?[^\W_]+)*", text)
Index for multiple models
It is also possible to create index for multiple models by using model inheritance. Create a single concrete descendant model of IndexEntry
with multiple descendants for each indexed model.
You can add some common fields to this model to be used for filtering the entries, but do not add objects
field. Then create descendants of your IndexEntry model. Each of the derived classes should have object
field which points to a model to be indexed and a search_template
.
I would advise to put some additional fields to your root index model, to be able to filter entries of any kind or display the results without additional query for descendant models. You can fill the fields with data by overriding your save
method in your index model.
It should also be possible to use GenericForeignKey
to define the object
field, but I haven't tried it.
Multiple indexes
Each direct descendant of IndexEntry
is a separate index, so you can have multiple independent
indexes in your site.
3. Prepare the database
Run the well known commands:
manage.py makemigrations
manage.py migrate
The index was tested with sqlite
and PostreSQL
.
Usage
Usually you use your index to do full-text seach within your data. Just remember to fill it with data first.
Feeding the index with data
The only thing you need to do is to create your IndexEntry
descendant model instance and save it.
from book_index.models import BookIndexEntry
from books.models import Book
for book in Book.objects.all():
BookIndexEntry(object=book).save()
There is a convenient shortcut for indexing querysets:
from book_index.models import BookIndexEntry
BookIndexEntry.objects.rebuild()
You can override get_index_queryset
method in your class to do select_related
or filter
or anything you need, before passing the queryset for indexing.
You can call the rebuild
method on your index model root class manager, to rebuild all descendant
index models.
Probably you would like to create you own management command to run the indexing, but actually you would not use it...
Runtime index updates
The indexing should be fast enough to be executed in runtime on every save of the indexed model.
Just connect a handler to post_save
signal:
from django.db.models.signals import post_save
class BookIndexEntry(IndexEntry):
...
@classmethod
def update_index(cls, instance, **kwargs):
cls.objects.refresh([instance])
post_save.connect(BookIndexEntry.update_index, sender=Book)
Now your index will be always up-to-date.
Searching
You can search the index by calling the manager's search
method. The query is tokenized using
the same tokenize
method as when indexing. All tokens must be found in a document to consider it
matched:
qs = BookIndexEntry.objects.search('Monty Python')
This will return a QuerySet
of BookIndexEntry
which contain both "Monty" and "Python" case
sensitively. If you want your search to be case-insensitive, then provide the query in lowercase:
qs = BookIndexEntry.objects.search('circus')
You can filter the search results, just as any other QuerySet
:
qs = BookIndexEntry.objects.search('circus').filter(object__release_date__year__gt=1970)
By default search returns matches only for whole words. If there is a single keyword in a query, the engine does a substring search, so search results may contain documents with words matching the keyword or containing it.
For example searching for "yth" may return documents containing "python", "pythonic", "myth", "demythologization".
Substring search works fine in sqlite
. In PostgreSQL
there is a problem with using the db index,
so the searching might be too slow.
Putting multiple words inside quotes forces searching for colocation of these words.
qs = BookIndexEntry.objects.search('"Monty Python\'s Flying Circus"')
This will return a QuerySet
of BookIndexEntry
which contain word "Monty" followed by "Python's",
followed by "Flying", followed by "Circus".
Search form
There is SearchFormMixin
available to easily to create your search view:
from django.views.generic.base import TemplateView
from book_index.models import BookIndexEntry
from django_native_search.forms import SearchFormMixin, searchform_factory
class SearchView(SearchFormMixin, TemplateView):
template_name = "books_index/search.html"
form_class = searchform_factory(BookIndexEntry)
The searchform_factory
function will use all fields with db_index = True
in BookIndexEntry
to create MultipleChoiceField
in your form. The fields can be used to filter the results.
Each filtering field in your form will contain all possible values of the field in the database.
Search template
The templated referred by template_name
is rendered with form
containing the form instance and
results
containing the queryset of search results if form is valid.
{% block content %}
<h2>Search</h2>
<form method="get" action=".">
<table>
{{ form }}
<tr>
<td> </td>
<td>
<input type="submit" value="Search" class="btn"/>
</td>
</tr>
</table>
</form>
{% if form.is_valid %}
<br/>
<h3>Found {{ results.count }} results</h3>
<ul>
{% for result in results %}
<li class = "search-result">
<ul>
<li class="result-link">
<a href="{{result.object.get_absolute_url}}">{{ result.object.title }}</a>
</li>
<li class="result-excerpt'>
{{result.excerpt}}
</li>
</ul>
</li>
{% endfor %}
</ul>
{% endif %}
{% endblock %}
The excerpt
member of index entry instance returns a fragment of the indexed document with
occurrences of search keywords hihghted with <em>
.
Settings
There are serveral settings to tweak the search engine.
SEARCH_MIN_SUBSTR_LENGTH
Default : 2
Minimum number of characters in keyword to run substring search.
SEARCH_MAX_SUBTSTR_COUNT_IN_QUERY
Default : 300
Maximum number of indexed words containing the substring to run substring search.
SEARCH_MAX_EXCERPT_FRAGMENTS
Default : 5
Maximum number of fragments containing keywords to be returned in excerpt.
SEARCH_EXCERPT_FRAGMENT_START_OFFSET
Default : -2
Offset of excerpt fragment start.
SEARCH_EXCERPT_FRAGMENT_END_OFFSET
Default : 5
Offset of excerpt fragment end.
SEARCH_MAX_RANKING_KEYWORDS_COUNT
Default : 3
Maximum number of keywords to be used for ranking the results. If the query contains more keywords, only the first ones will be used to calculate the ranking of results.
Search API
To be described...
Look into the code to check what you can do with it.
Performance
Despite the naive design, the index performs surpsisingly well, even with quite large datasets. It can search through 100k documents containing 10M words in a fraction of a second.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file django-native-search-0.6.tar.gz
.
File metadata
- Download URL: django-native-search-0.6.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6bbd2e8a005c941cd52aeef4dfe80509a9dfdbebad0b1c8eb78974ab472b9b52 |
|
MD5 | f45f15df10b24e6f764ba6105846044c |
|
BLAKE2b-256 | b0547a9127d597d0b786c64b60cb173cd022c5c37a18e62d2f7b85910a59bf2d |
File details
Details for the file django_native_search-0.6-py3-none-any.whl
.
File metadata
- Download URL: django_native_search-0.6-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 995f0d2e87269cdc30ef7f2488174baa8942b7ffdfac836814e14b1305945ddf |
|
MD5 | 367f4bc46a40788b94271a0a883d6560 |
|
BLAKE2b-256 | 9a2ac8a0fdf587c3e64f22f3e51bc7fe2e09776cb59aee9c6c518d23f98744ce |