A MeiliSearch backend for Wagatil
Project description
Wagtail MeiliSearch
This is a (beta) Wagtail search backend for the MeiliSearch search engine.
Installation
poetry add wagtail_meilisearch
or pip install wagtail_meilisearch
Upgrading
If you're upgrading MeiliSearch from 0.9.x to anything higher, you will need to destroy and re-create MeiliSearch's data.ms directory.
Configuration
See the MeiliSearch docs for info on the values you want to add here.
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_meilisearch.backend',
'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
},
}
Update strategies
Indexing a very large site with python manage.py update_index
can be pretty taxing on the CPU, take quite a long time, and reduce the responsiveness of the MeiliSearch server. Wagtail-MeiliSearch offers two update strategies, soft
and hard
. The default, soft
strategy will do an "add or update" call for each document sent to it, while the hard
strategy will delete every document in the index and then replace them.
There are tradeoffs with either strategy - hard
will guarantee that your search data matches your model data, but be hard work on the CPU for longer. soft
will be faster and less CPU intensive, but if a field is removed from your model between indexings, that field data will remain in the search index.
One useful trick is to tell Wagtail that you have two search backends, with the default backend set to do soft
updates that you can run nightly, and a second backend with hard
updates that you can run less frequently.
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_meilisearch.backend',
'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
},
'hard': {
'BACKEND': 'wagtail_meilisearch.backend',
'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),
'UPDATE_STRATEGY': 'hard'
}
}
If you use this technique, remember to pass the backend name into the update_index
command otherwise both will run.
python manage.py update_index --backend default
for a soft update
python manage.py update_index --backend hard
for a hard update
Delta strategy
The delta
strategy is useful if you habitually add created_at and updated_at timestamps to your models. This strategy will check the fields...
first_published_at
last_published_at
created_at
updated_at
And only update the records for objects where one or more of these fields has a date more recent than the time delta specified in the settings.
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_meilisearch.backend',
'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
'UPDATE_STRATEGY': delta,
'UPDATE_DELTA': {
'weeks': -1
}
}
}
If the delta is set to {'weeks': -1}
, wagtail-meilisearch will only update indexes for documents where one of the timestamp fields has a date within the last week. Your time delta must be a negative.
Under the hood we use Arrow, so you can use any keyword args supported by Arrow's shift()
.
If you set UPDATE_STRATEGY
to delta
but don't provide a value for UPDATE_DELTA
wagtail-meilisearch will default to {'weeks': -1}
.
Skip models
Sometimes you might have a site where a certain page model is guaranteed not to change, for instance an archive section. After creating your initial search index, you can add a SKIP_MODELS
key to the config to tell wagtail-meilisearch to ignore specific models when running update_index
. Behind the scenes wagtail-meilisearch returns a dummy model index to the update_index
management command for every model listed in your SKIP_MODELS
- this ensures that this setting only affects update_index
, so if you manually edit one of the models listed it should get re-indexed with the update signal.
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_meilisearch.backend',
'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),
'UPDATE_STRATEGY': 'delta',
'SKIP_MODELS': [
'core.ArchivePage',
]
}
}
Stop Words
Stop words are words for which we don't want to place significance on their frequency. For instance, the search query tom and jerry
would return far less relevant results if the word and
was given the same importance as tom
and jerry
. There's a fairly sane list of English language stop words supplied, but you can also supply your own. This is particularly useful if you have a lot of content in any other language.
MY_STOP_WORDS = ['a', 'list', 'of', 'words']
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_meilisearch.backend',
[...]
'STOP_WORDS': MY_STOP_WORDS
},
}
Or alternatively, you can extend the built in list.
from wagtail_meilisearch.settings import STOP_WORDS
MY_STOP_WORDS = STOP_WORDS + WELSH_STOP_WORDS + FRENCH_STOP_WORDS
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_meilisearch.backend',
[...]
'STOP_WORDS': MY_STOP_WORDS
},
}
Query limits
If you have a lot of DB documents, the final query to the database can be quite a heavy load. Meilisearch's relevance means that it's usually pretty safe to restrict the number of documents Meilisearch returns, and therefore the number of documents your app needs to get from the database. The limit is per model, so if your project has 10 page types and you set a limit of 1000, there's a possible 10000 results.
WAGTAILSEARCH_BACKENDS = {
'default': {
'BACKEND': 'wagtail_meilisearch.backend',
[...]
'QUERY_LIMIT': 1000
},
}
Contributing
If you want to help with the development I'd be more than happy. The vast majority of the heavy lifting is done by MeiliSearch itself, but there is a TODO list...
TODO
- Faceting
- Write tests
- Performance improvements
- Make use of the async in meilisearch-python
Implement boosting in the sort algorithmImplement stop wordsSearch resultsAdd support for the autocomplete apiEnsure we're getting results by relevance
Change Log
0.17.3
- Fixes a bug where the meilisearch indexes could end up with a wrong maxTotalHits
0.17.2
- Fixes a bug where the backend could report the wrong counts for results. This turned out to be down to the fact that _do_count can sometimes get called before _do_search, possibly due to Django's paginator. This finally explains why sometimes search queries ran twice.
0.17.1
- Fixes a bug where multi_search can fail when a model index doesn't exist. For models have no documents meilisearch doesn't create the empty index, so we need to check active indexes before calling multi_search otherwise the entire call fails.
0.17.0
- A few small performance and reliability improvements, and a lot of refactoring of the code into multiple files to make future development a bit simpler.
0.16.0
- Thanks to @BertrandBordage, a massive speed improvement through using the /multi-search endpoint introduced in Meilisearch 1.1.0
0.14.0
- Adds Django 4 support and compatibility with the latest meilisearch server (0.30.2) and meilisearch python (0.23.0)
0.14.0
- Updates to work with the latest versions of Meilisearch (v0.28.1) and meilisearch-python (^0.19.1)
0.13.0
- Yanked, sorry
0.12.0
- Adds QUERY_LIMIT option to settings
0.11.0
- Compatibility changes to keep up with MeiliSearch and meilisearch-python
- we've also switched to more closely tracking the major and minor version numbers of meilisearch-python so that it's easier to see compatibility at a glance.
- Note: if you're upgrading from an old version of MeiliSearch you may need to destroy MeiliSearch's data directory and start with a clean index.
0.1.5
- Adds the delta update strategy
- Adds the SKIP_MODELS setting
- Adds support for using boost on your search fields
Thanks
Thank you to the devs of Wagtail-Whoosh. Reading the code over there was the only way I could work out how Wagtail Search backends are supposed to work.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file wagtail_meilisearch-0.17.3.tar.gz
.
File metadata
- Download URL: wagtail_meilisearch-0.17.3.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.7 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c9592367678190fc833f471a24674ceac0984f22b1c795eb4df1c0e24dd2908 |
|
MD5 | 1c27c939a81ac8366fdb3e11d2fb4698 |
|
BLAKE2b-256 | 39ae5849bf53d87143ae9a4a966121f0ee02770a0c1ed66ec1c13031561fa251 |
File details
Details for the file wagtail_meilisearch-0.17.3-py3-none-any.whl
.
File metadata
- Download URL: wagtail_meilisearch-0.17.3-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.7 Darwin/24.0.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a8173756e8c75452b482224e5ed312d1d427d78a58cffb35b8bfdebd83c99b43 |
|
MD5 | 8829e4ae292a42f1975176413743818c |
|
BLAKE2b-256 | cb36543f88f7b579a209365887ff814c29d0f87b8d1528745fabef530aeef8f5 |