Skip to main content

Configurable indexing and other extras for Haystack (with ElasticSearch biases).

Project description

https://badge.fury.io/py/elasticstack.svg https://travis-ci.org/bennylope/elasticstack.svg?branch=master https://pypip.in/d/elasticstack/badge.svg
Version:

0.5.0

Author:

Ben Lopatin (http://benlopatin.com)

Configurable indexing and other extras for Haystack (with ElasticSearch biases).

Full documentation is on Read the Docs.

Requirements

  • Django: tested against Django 1.8, and 1.9

  • Haystack: tested against Haystack 2.4.0, it should work with any combination of Haystack and Django that work

  • ElasticSearch: presumably any newish version will do, however the only version tested against so far is 0.19.x

Features and goals

Some of these features are backend agnostic but most features have ElasticSearch in mind.

For more background see the blog post Stretching Haystack’s ElasticSearch Backend.

Global configurable index mapping

The search mapping provided by Haystack’s ElasticSearch backend includes brief but sensible defaults for nGram analysis. You can globaly add change these settings or add your own mappings by providing a mapping dictionary using ELASTICSEARCH_INDEX_SETTINGS in your settings file. This example takes the default mapping and adds a synonym analyzer:

ELASTICSEARCH_INDEX_SETTINGS = {
    'settings': {
        "analysis": {
            "analyzer": {
                "synonym_analyzer" : {
                    "type": "custom",
                    "tokenizer" : "standard",
                    "filter" : ["synonym"]
                },
                "ngram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_ngram", "synonym"]
                },
                "edgengram_analyzer": {
                    "type": "custom",
                    "tokenizer": "lowercase",
                    "filter": ["haystack_edgengram"]
                }
            },
            "tokenizer": {
                "haystack_ngram_tokenizer": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15,
                },
                "haystack_edgengram_tokenizer": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15,
                    "side": "front"
                }
            },
            "filter": {
                "haystack_ngram": {
                    "type": "nGram",
                    "min_gram": 3,
                    "max_gram": 15
                },
                "haystack_edgengram": {
                    "type": "edgeNGram",
                    "min_gram": 2,
                    "max_gram": 15
                },
                "synonym" : {
                    "type" : "synonym",
                    "ignore_case": "true",
                    "synonyms_path" : "synonyms.txt"
                }
            }
        }
    }
}

The synonym filter is ready for your index, but will go unused yet.

Before your new analyzer can be used you will need to change your Haystack engine and rebuild/update your index. In your settings.py modify HAYSTACK_CONNECTIONS accordingly:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'elasticstack.backends.ConfigurableElasticSearchEngine',
        'URL': env_var('HAYSTACK_URL', 'http://127.0.0.1:9200/'),
        'INDEX_NAME': 'haystack',
    },
}

The default analyzer for non-nGram fields in Haystack’s ElasticSearch backend is the snowball analyzer. A perfectly good analyzer but not necessarily what you need. It’s also language specific (English by default).

Specify your analyzer with ELASTICSEARCH_DEFAULT_ANALYZER in your settings file:

ELASTICSEARCH_DEFAULT_ANALYZER = 'synonym_analyzer'

Now all your analyzed fields, except for nGram fields, will be analyzed using synonym_analyzer.

If you want to specify a custom search_analyzer for nGram/EdgeNgram fields, define it with the ELASTICSEARCH_DEFAULT_NGRAM_SEARCH_ANALYZER settings:

ELASTICSEARCH_DEFAULT_NGRAM_SEARCH_ANALYZER = 'standard'

Configurable index mapping per index

Alternatively you can configure index mapping per index. This is usefull for multilanguage index settup. In this case HAYSTACK_CONNECTION contains key SETTINGS_NAME have to match with name in ELASTICSEARCH_INDEX_SETTINGS:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'elasticstack.backends.ConfigurableElasticSearchEngine',
        'URL': env_var('HAYSTACK_URL', 'http://127.0.0.1:9200/'),
        'INDEX_NAME': 'haystack',
        'SETTINGS_NAME': 'cs',
        'DEFAULT_ANALYZER': 'czech_hunspell',
        'DEFAULT_NGRAM_SEARCH_ANALYZER': 'standard',
    },
}

ELASTICSEARCH_INDEX_SETTINGS = {
    'cs': {
        "settings": {
            "analysis": {
                "analyzer": {
                    "czech_hunspell": {
                        "type": "custom",
                        "tokenizer": "standard",
                        "filter": ["stopwords_CZ", "lowercase", "hunspell_CZ", "stopwords_CZ", "remove_duplicities"]
                    }
                },
                "filter": {
                    "stopwords_CZ": {
                        "type": "stop",
                        "stopwords": ["právě", "že", "test", "_czech_"],
                        "ignore_case": True
                    },
                    "hunspell_CZ": {
                        "type": "hunspell",
                        "locale": "cs_CZ",
                        "dedup": True,
                        "recursion_level": 0
                    },
                    "remove_duplicities": {
                        "type": "unique",
                        "only_on_same_position": True
                    },
                }
            }
        }
    },
}

Field based analysis

Even with a new default analyzer you may want to change this on a field by field basis as fits your needs. To do so, use the fields from elasticstack.fields to specify your analyzer with the analyzer keyword argument:

from haystack import indexes
from elasticstack.fields import CharField
from myapp.models import MyContent

class MyContentIndex(indexes.SearchIndex, indexes.Indexable):
    text = CharField(document=True, use_template=True,
            analyzer='synonym_analyzer')

    def get_model(self):
        return MyContent

Django CBV style views

Haystacks’s class based views predate the inclusion of CBVs into the Django core and so the paradigms are different. This makes it harder to impossible to make use of view mixins.

The bundled SearchView and FacetedSearchView classes are based on django.views.generic.edit.FormView using the SearchMixin and FacetedSearchMixin, respectively. The SearchMixin provides the necessary search related attributes and overloads the form processing methods to execute the search.

The SearchMixin adds a few search specific attributes:

  • load_all - a Boolean value for specifying database lookups

  • queryset - a default SearchQuerySet. Defaults to EmtpySearchQuerySet

  • search_field - the name of the form field used for the query. This is added to allow for views which may have more than one search form. Defaults to q.

Management commands

show_mapping

Make a change and wonder why your results don’t look as expected? The management command show_mapping will print the current mapping for your defined search index(es). At the least it may show that you’ve simply forgotten to update your index with new mappings:

python manage.py show_mapping

By default this will display the existing_mapping which shows the index, document type, and document properties.:

{
    "haystack": {
        "modelresult": {
            "properties": {
                "is_active": {
                    "type": "boolean"
                },
                "text": {
                    "type": "string"
                },
                "published": {
                    "type": "date",
                    "format": "dateOptionalTime"
                }
            }
        }
    }
}

If you provide the –detail flag this will return only the field mappings but including additional details, such as boost levels and field-specific analyzers.:

{
    "is_active": {
        "index": "not_analyzed",
        "boost": 1,
        "store": "yes",
        "type": "boolean"
    },
    "text": {
        "index": "analyzed",
        "term_vector": "with_positions_offsets",
        "type": "string",
        "analyzer": "custom_analyzer",
        "boost": 1,
        "store": "yes"
    },
    "pub_date": {
        "index": "analyzed",
        "boost": 1,
        "store": "yes",
        "type": "date"
    }
}

show_document

Provided the name of an indexed model and a key it generates and prints the generated document for this object:

python manage.py show_document myapp.MyModel 19181

The JSON document will be formatted with ‘pretty’ indenting.

Stability, docs, and tests

The form, view, and backend functionality in this project is considered stable. Test coverage is not substantial, but is run against Django 1.8 through Django 1.10 on Python 2.7, 3.4, and 3.5.

Why not add this stuff to Haystack?

This project first aims to solve problems related specifically to working with ElasticSearch. Haystack is 1) backend agnostic (a good thing), 2) needs to support existing codebases, and 3) not my project. Most importantly, adding these features through a separate Django app means providing them without needing to fork Haystack. Hopefully some of the features here, once finalized and tested, will be suitable to add to Haystack.

History

0.5.0 (2017-03-17)

  • Replace deprecated option_list in commands with add_arguments method

  • Update Django versions in tox config and docs

0.4.1 (2016-05-05)

  • Fix encoding issue in installation. In at least one known environment/Python3 combination an encoding issue prevented installation of the package.

0.4.0 (2016-01-28)

  • Allow changing search settings on an index-by-index basis

0.3.0 (2015-12-31)

  • Set default analyzer for ngram fields

0.2.0 (2015-09-29)

  • Switch to py.test

  • Tests against Django 1.8, 1.9

  • Drop pyelasticsearch requirement for installation

0.1.1 (2015-01-13)

  • Bug fix in show_document management command

0.1.0 (2014-11-24)

  • Major structural changes

  • Bugfix for configurable search fields

0.0.6 (2013-10-04)

  • Require pyelasticsearch for installation

0.0.5 (2013-09-28)

  • Fixed reference to old method

0.0.4 (2013-09-28)

  • Search form can search using specified field name

  • Added management command to output mapping for an individual document

0.0.3 (2013-09-28)

  • Added default analyzer setting

0.0.2 (2013-09-28)

  • Packaging fix

0.0.1 (2013-09-28)

  • Initial release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elasticstack-0.5.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distributions

elasticstack-0.5.0-py3-none-any.whl (24.3 kB view details)

Uploaded Python 3

elasticstack-0.5.0-py2-none-any.whl (24.3 kB view details)

Uploaded Python 2

File details

Details for the file elasticstack-0.5.0.tar.gz.

File metadata

File hashes

Hashes for elasticstack-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8f74158fc7ea525cbee20bcb7c41a617c9c3016a12324eb48ba8a06332090357
MD5 710f8ffac8172e44dc7696f78a59f6ce
BLAKE2b-256 e1c307668bd6e3c76c2f5caeeaf0ef767078ef34c7c9e74db86447aa94d7637b

See more details on using hashes here.

File details

Details for the file elasticstack-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for elasticstack-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0ceefb4ee4282e9582121bd11c4200aa80d31352c29ad63d0abc28b6a7f9daec
MD5 a03476b4ae3c00f0e715afd5fa8f476e
BLAKE2b-256 360964ffa3ff0ca4cd2fe256924a5f6cc98b6104726f30531acec716d3d0ead7

See more details on using hashes here.

File details

Details for the file elasticstack-0.5.0-py2-none-any.whl.

File metadata

File hashes

Hashes for elasticstack-0.5.0-py2-none-any.whl
Algorithm Hash digest
SHA256 97334f80d404867f10150966feea1b6c6cbd859dc471a57e4e962da212386f34
MD5 fe564ec93d8ab71df42a7e7433521c6c
BLAKE2b-256 7ec528eb449e57941d7025ce104baaecb8214b1113e1205576749d81518341d9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page