Skip to main content

No project description provided

Project description

https://travis-ci.com/cr0mbly/django-elasticsearch-model-builder.svg?token=WSHb2ssbuqzAyoqCvdCs&branch=master https://img.shields.io/pypi/v/django-elasticsearch-model-binder.svg https://img.shields.io/pypi/l/django-elasticsearch-model-binder.svg

Plugin for a Django/Elasticsearch paired environment that aligns CRUD operations within Django with the related indexes that are tied to the models that they build.

Setup

This intro assumes you have an installed version of both Django and Elasticsearch on your machine. The expected configuration for setup within django-elasticsearch-model-builder is within your settings for example:

setup.py

DJANGO_ES_MODEL_CONFIG = {
    'hosts': [{'host': 'localhost', 'port': 9200}]
}

This example is fairly basic but follows the expected format for configuration of the Elasticsearch client built into this plugin so anything you place in this settings dictionary will be fed straight in, this helps with more complicated Elasticsearch setups with encryption and multiple clusters running off different hosts.

IF your wanting the index to be automatically generated during first model generation you can add the following to your app to setup all of the indexes and alises so your model can start emitting cached fields to Elasticsearch

from django.apps import AppConfig

class TestAppConfig(AppConfig):
    name = 'tests.testapp'
    verbose_name = 'TestApp'

    def ready(self):
        from tests.test_app.models import Author
        from django_elasticsearch_model_binder.utils import (
            initialize_es_model_index
        )
        initialize_es_model_index(Author)

Tieing models to indexes.

Tieing a model to an elasticsearch index can be done on the fly by adding the base ES model

class Author(ESBoundModel):
    user = models.ForeignKey(
        User, on_delete=models.CASCADE
    )
    publishing_name = models.CharField(
        max_length=25, blank=True, null=True
    )

    # Fields to be picked up and cached in model
    es_cached_model_fields = ['publishing_name', 'user']

Casting db model fields into Elasticsearch format

This plugin works with a principle that fields should be ready to be serialized into Elasticsearch data structures like sets for instance will fail if you try to index them into Elasticsearch. By default this plugin casts the following base types to Elasticsearch compatible values.

  • models.Model -> integer containing the models pk

  • datetime.datetime -> string in the format ‘YYYY-MM-DD HH:MM:SS’

  • all other values -> str(value) (attempt to cast all other values)

If this mapping doesn’t suit you or you wish to extend it you can do so by overriding the convert_to_indexable_format method on the model.

class Author(ESBoundModel):

    def convert_to_indexable_format(self, value):
        if isinstance(value, float):
            # Round value for uniform integer value
            return round(value)

        # ... any further field rules

Setting non model fields on index

By default es_cached_model_fields will only support database fields for indexing this is for performance reasons where often you might want to index a complex piece of data that may take a while to generate over larger database tables. To get around this this plugin supports a different approach for any fields that aren’t stored directly on this model. To this end we make use of the ExtraModelFieldBase class to define a resolver for a custom field that will work over larger data-sets in way that can be made more efficient as your data-set grows and requirements change. For example:

from django_elasticsearch_model_binder import ExtraModelFieldBase


class UniqueIdentiferField(ExtraModelFieldBase):
    # Name of the custom field we want indexed for the model.
    field_name = 'total_number_of_duplicate_names'

    @classmethod
    def custom_model_field_map(cls, model_pks):
        """
        Generate map of number of duplicate first names per model.
        """
        values = (
            cls.objects
            .filter(pk__in=model_pks)
            .values_list('pk', 'first_name')
        )

        name_count_map = defaultdict(int)
        for _, name in values:
            name_count_map[name] += 1

        # Return map of model pk to value we want
        # indexed into Elasticsearch
        return {
            pk: name_count_map[name]
            for pk, name in values
        }

class User(ESBoundModel):
    first_name = model.CharField()
    es_cached_extra_fields = (UniqueIdentiferField,)

This will result in an index being created for the user model with a single custom field per model document set too:

{total_number_of_duplicate_names: <int>}

Setting index name

This example is fairly basic it will create an Elasticsearch index generated with an index name comprised of the model class name and its module path directory. this can be overridden by setting the index_name field in the model:

class Author(ESBoundModel):
    index_name ='my-custom-index-name'

or overriding the get_index_base_name method, by default the index will be generated with a name reflecting the modules path and model name e.g.

<module-path>-<model-name>-<unique-uuid>

Default Aliases

By default this plugin generates the index on first start of the app if it hasn’t been defined. It also generates a default read/write alias that allows indexes to be rebuilt on the fly with no downtime for your app.

Aliases utilise the same index name as their parent but are postfixed by default with a -read/-write to help differentiate from the main index. you can override this on the model by defining your own postfix, e.g.

class Author(ESBoundModel):
    index_name ='my-custom-index-name'

    es_index_alias_read_postfix = 'read-only-access'
    es_index_alias_write_postfix = 'write-only-access'

Will generate aliases in the format of:

  • my-custom-index-name-read-only-access

  • my-custom-index-name-write-only-access

Or define your own way by overriding the default get_read_alias_name/get_write_alias_name

Saving/Removing db model in Elasticsearch

Saving and removing a model in ElasticSearch happens automatically on .save/ .delete operations. This should be noted as any bulk_create/bulk_update will ignore this and you’ll need to manage these cases within your business logic of the app. See below for how to do these operations in bulk where this is a requirement of the business case.

Preforming bulk operations

This plugin also supports a handy set of calls that can be tied into a query manager to bulk create/update/delete these models in Elasticsearch.

To enable this you’ll need to add the plugins query manager mixin to your model, for example.

from django.db.models import QuerySet

from django_elasticsearch_model_binder.mixins import ESQuerySetMixin


class ESEnabledQuerySet(ESQuerySetMixin, QuerySet):
    pass

class Author(ESBoundModel):
    index_name ='my-custom-index-name'

    es_index_alias_read_postfix = 'read-only-access'
    es_index_alias_write_postfix = 'write-only-access'

    objects = ESEnabledQuerySet.as_manager()

You can then define a query via the manager targeting the models you want to update, delete from Elasticsearch e.g.

# Re-save models with selected fields into Elasticsearch
Author.objects.filter(pk__lt=100).reindex_into_es()

# Delete models with selected fields into Elasticsearch
Author.objects.filter(pk__lt=100).delete_from_es()

QuerySet filtering

As noted above theres a number of operations that can be made off of the Queryset mixin. As expected this supports filtering of Queryset results by some defined ElasticSearch query. Say we wanted to filter a table by the prefix of a Charfield indexed in ElasticSearch we can go:

query = {
    'match': {
        'publishing_name': 'Bobby*'
    }
}

queryset = Author.objects.filter_by_es_search(query=query)

>> queryset.values_list('publishing_name', flat=True)
>> ['Bobby Fakington', 'Bobby not-realington']

Supported by the sort_query kwarg you can also specify a queryset return ordering for the filter_by_es_search.

queryset = Author.objects.filter_by_es_search(
    query={'prefix': {'publishing_name.keyword': 'Bill'}},
    sort_query=[{
        'publishing_name.keyword': {
            'order': 'asc', 'missing': '_last'
        }
    }]
)

This is useful in cases where ES backed field sorting trumps any model defined order_by.

Retrieving ES fields

Pulling cached fields back from Elasticsearch can be preformed both on the model and related manager if the ESQuerySetMixin is used.

From the model:

>>> author = Author.objects.first()
>>> author.retrive_es_fields()

From the QuerySet:

>>> Author.objects.filter(pk__lt=100).retrieve_es_docs()

You can also retrieve the verbose output of the query by using the only_include_fields=False on both the above calls.

Rebuilding an entire table in Elasticsearch

At times you may want to throw away your current index and replace it with a new one. For larger data-sets this can be problematic as downtime while this rebuilds is unacceptable. This plugin exposes a simple method to preform a complete refresh of the index from either the entire models table or from a slice of the table defined by a queryset. This will automatically create a new index and point the write alias to it while allowing the old index to be used with the read alias for your app until the rebuild is finished, resulting in no index downtime.

This can be run from shell or any kind of automated task by running:

# Full table rebuild of the Author model.
>>> Author.rebuild_es_index()

# Full table rebuild of the Author model.
>>> sliced_queryset = Author.objects.filter(pk__lt=100)
>>> Author.rebuild_es_index(queryset=sliced_queryset)

Setting indexable format

Indexes are only rebuilt sharding accoring to configuration on a full index rebuild rebuild_es_index. To alter how the index is searched with Elasticsearch you’ll need to override the get_index_mapping. By default this is set to an empty implementation e.g.

@classmethod
def get_index_mapping(cls):
    return {'settings': {}, 'mappings': {}}

But you can extend this with any mapping you’d like for the fields being indexed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page