Skip to main content

A MeiliSearch backend for Wagatil

Project description

Wagtail MeiliSearch

This is a (beta) Wagtail search backend for the MeiliSearch search engine.

Installation

poetry add wagtail_meilisearch or pip install wagtail_meilisearch

Upgrading

If you're upgrading MeiliSearch from 0.9.x to anything higher, you will need to destroy and re-create MeiliSearch's data.ms directory.

Configuration

See the MeiliSearch docs for info on the values you want to add here.

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
    },
}

Update strategies

Indexing a very large site with python manage.py update_index can be pretty taxing on the CPU, take quite a long time, and reduce the responsiveness of the MeiliSearch server. Wagtail-MeiliSearch offers two update strategies, soft and hard. The default, soft strategy will do an "add or update" call for each document sent to it, while the hard strategy will delete every document in the index and then replace them.

There are tradeoffs with either strategy - hard will guarantee that your search data matches your model data, but be hard work on the CPU for longer. soft will be faster and less CPU intensive, but if a field is removed from your model between indexings, that field data will remain in the search index.

One useful trick is to tell Wagtail that you have two search backends, with the default backend set to do soft updates that you can run nightly, and a second backend with hard updates that you can run less frequently.

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
    },
    'hard': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),
        'UPDATE_STRATEGY': 'hard'
    }
}

If you use this technique, remember to pass the backend name into the update_index command otherwise both will run.

python manage.py update_index --backend default for a soft update python manage.py update_index --backend hard for a hard update

Delta strategy

The delta strategy is useful if you habitually add created_at and updated_at timestamps to your models. This strategy will check the fields...

  • first_published_at
  • last_published_at
  • created_at
  • updated_at

And only update the records for objects where one or more of these fields has a date more recent than the time delta specified in the settings.

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', '')
        'UPDATE_STRATEGY': delta,
        'UPDATE_DELTA': {
            'weeks': -1
        }
    }
}

If the delta is set to {'weeks': -1}, wagtail-meilisearch will only update indexes for documents where one of the timestamp fields has a date within the last week. Your time delta must be a negative.

Under the hood we use Arrow, so you can use any keyword args supported by Arrow's shift().

If you set UPDATE_STRATEGY to delta but don't provide a value for UPDATE_DELTA wagtail-meilisearch will default to {'weeks': -1}.

Skip models

Sometimes you might have a site where a certain page model is guaranteed not to change, for instance an archive section. After creating your initial search index, you can add a SKIP_MODELS key to the config to tell wagtail-meilisearch to ignore specific models when running update_index. Behind the scenes wagtail-meilisearch returns a dummy model index to the update_index management command for every model listed in your SKIP_MODELS - this ensures that this setting only affects update_index, so if you manually edit one of the models listed it should get re-indexed with the update signal.

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        'HOST': os.environ.get('MEILISEARCH_HOST', 'http://127.0.0.1'),
        'PORT': os.environ.get('MEILISEARCH_PORT', '7700'),
        'MASTER_KEY': os.environ.get('MEILI_MASTER_KEY', ''),
        'UPDATE_STRATEGY': 'delta',
        'SKIP_MODELS': [
            'core.ArchivePage',
        ]
    }
}

Stop Words

Stop words are words for which we don't want to place significance on their frequency. For instance, the search query tom and jerry would return far less relevant results if the word and was given the same importance as tom and jerry. There's a fairly sane list of English language stop words supplied, but you can also supply your own. This is particularly useful if you have a lot of content in any other language.

MY_STOP_WORDS = ['a', 'list', 'of', 'words']

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        [...]
        'STOP_WORDS': MY_STOP_WORDS
    },
}

Or alternatively, you can extend the built in list.

from wagtail_meilisearch.settings import STOP_WORDS

MY_STOP_WORDS = STOP_WORDS + WELSH_STOP_WORDS + FRENCH_STOP_WORDS

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        [...]
        'STOP_WORDS': MY_STOP_WORDS
    },
}

Query limits

If you have a lot of DB documents, the final query to the database can be quite a heavy load. Meilisearch's relevance means that it's usually pretty safe to restrict the number of documents Meilisearch returns, and therefore the number of documents your app needs to get from the database. The limit is per model, so if your project has 10 page types and you set a limit of 1000, there's a possible 10000 results.

WAGTAILSEARCH_BACKENDS = {
    'default': {
        'BACKEND': 'wagtail_meilisearch.backend',
        [...]
        'QUERY_LIMIT': 1000
    },
}

Contributing

If you want to help with the development I'd be more than happy. The vast majority of the heavy lifting is done by MeiliSearch itself, but there is a TODO list...

TODO

  • Faceting
  • Write tests
  • Performance improvements
  • Make use of the async in meilisearch-python
  • Implement boosting in the sort algorithm
  • Implement stop words
  • Search results
  • Add support for the autocomplete api
  • Ensure we're getting results by relevance

Change Log

0.17.3

  • Fixes a bug where the meilisearch indexes could end up with a wrong maxTotalHits

0.17.2

  • Fixes a bug where the backend could report the wrong counts for results. This turned out to be down to the fact that _do_count can sometimes get called before _do_search, possibly due to Django's paginator. This finally explains why sometimes search queries ran twice.

0.17.1

  • Fixes a bug where multi_search can fail when a model index doesn't exist. For models have no documents meilisearch doesn't create the empty index, so we need to check active indexes before calling multi_search otherwise the entire call fails.

0.17.0

  • A few small performance and reliability improvements, and a lot of refactoring of the code into multiple files to make future development a bit simpler.

0.16.0

  • Thanks to @BertrandBordage, a massive speed improvement through using the /multi-search endpoint introduced in Meilisearch 1.1.0

0.14.0

  • Adds Django 4 support and compatibility with the latest meilisearch server (0.30.2) and meilisearch python (0.23.0)

0.14.0

  • Updates to work with the latest versions of Meilisearch (v0.28.1) and meilisearch-python (^0.19.1)

0.13.0

  • Yanked, sorry

0.12.0

  • Adds QUERY_LIMIT option to settings

0.11.0

  • Compatibility changes to keep up with MeiliSearch and meilisearch-python
  • we've also switched to more closely tracking the major and minor version numbers of meilisearch-python so that it's easier to see compatibility at a glance.
  • Note: if you're upgrading from an old version of MeiliSearch you may need to destroy MeiliSearch's data directory and start with a clean index.

0.1.5

  • Adds the delta update strategy
  • Adds the SKIP_MODELS setting
  • Adds support for using boost on your search fields

Thanks

Thank you to the devs of Wagtail-Whoosh. Reading the code over there was the only way I could work out how Wagtail Search backends are supposed to work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wagtail_meilisearch-0.17.3.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

wagtail_meilisearch-0.17.3-py3-none-any.whl (18.5 kB view details)

Uploaded Python 3

File details

Details for the file wagtail_meilisearch-0.17.3.tar.gz.

File metadata

  • Download URL: wagtail_meilisearch-0.17.3.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.12.7 Darwin/24.0.0

File hashes

Hashes for wagtail_meilisearch-0.17.3.tar.gz
Algorithm Hash digest
SHA256 2c9592367678190fc833f471a24674ceac0984f22b1c795eb4df1c0e24dd2908
MD5 1c27c939a81ac8366fdb3e11d2fb4698
BLAKE2b-256 39ae5849bf53d87143ae9a4a966121f0ee02770a0c1ed66ec1c13031561fa251

See more details on using hashes here.

File details

Details for the file wagtail_meilisearch-0.17.3-py3-none-any.whl.

File metadata

File hashes

Hashes for wagtail_meilisearch-0.17.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a8173756e8c75452b482224e5ed312d1d427d78a58cffb35b8bfdebd83c99b43
MD5 8829e4ae292a42f1975176413743818c
BLAKE2b-256 cb36543f88f7b579a209365887ff814c29d0f87b8d1528745fabef530aeef8f5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page