Skip to main content

Toolkit for Elasticsearch Synonym files.

Project description

![Elasticsearch Synonyms](docs/banner.png)

[![Build Status][travis-img-url]][travis-href]
[![PyPI Version][pypi-v-shield-url]][pypi-href]

This repository contains a curated dataset of synonyms in [Solr Format][1]. These
synonyms can be used for [Elasticsearch Synonym Token Filter][2] configuration.

Additional helper tools in this repository:

- `synlint`: Commandline tool to lint and validate the synonym files.
- `synonyms.sublime-syntax`: Syntax highlighting file for Sublime Text 3.

If you're using Elasticssearch with Django, you might find [`dj-elasticsearch-flex`][es_flex] useful.

## Why?

Trying to configure Synonyms in Elasticsearch, I found that docs for it are surprisingly scattered.
The docs that are available do not do much justice either and miss out many corner cases.

For instance, an incorrect Solr mapping: `hello, world,` would be happily added in index configuration.
However, as soon as you'd try to re-open the index, you'd get a `malform_input_exception` [(discussion thread)][4].

This repository solves such problems by with a linter tool that can be used to validate the synonym
files beforehand.

## Datasets

The synonym files in `data/` can be used directly in elasticsearch configuration.

Following datasets are currently available:
- `be-ae`: British English and American English Spellings. From [AVKO.org](https://to.noop.pw/2sNor7C).

## Installation

If you want to use the `synlint` tool, install the package from PIP using:
```shell
pip install elasticsearch-synonym-toolkit
```

The Python Package is installed as `es_synonyms`. This will also install a linter tool,
`es-synlint`. Use it with:

```shell
es-synlint [synonymfile]
```

## Usage

In most cases, you'd want to use this module as a helper for loading validated synonyms from a file or a url:
```python
from es_synonyms import load_synonyms

# Load synonym file at some URL:
be_ae_syns = load_synonyms('https://to.noop.pw/2sI9x4s')
# Or, from filesystem:
other_syns = load_synonyms('data/be-ae.synonyms')
```

Configuring [Synonym Tokenfilter][2] with [Elasticsearch DSL Py][3], is very easy, too:
```python
from elasticsearch_dsl import analyzer, token_filter

be_ae_syns = load_synonyms('https://to.noop.pw/2sI9x4s')

# Create a tokenfilter
brit_spelling_tokenfilter = token_filter(
'my_tokenfilter', # Any name for the filter
'synonym', # Synonym filter type
synonyms=be_ae_syns # Synonyms mapping will be inlined
)
# Create analyzer
brit_english_analyzer = analyzer(
'my_analyzer',
tokenizer='standard',
filter=[
'lowercase',
brit_spelling_tokenfilter
])
```

To use the underlying linter, you can import `SynLint` class.

## Development

- Clone this repository.
- Install package dependencies via `pip` with: `pip install -r requirements.txt`.
- To run tests:
```shell
./panda test:all
```

## License

The tools and codes are licensed under MIT.
The datasets are used under fair use and are derivative of the original sources.

[1]: https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-SynonymFilter
[2]: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html#analysis-synonym-tokenfilter
[3]: https://elasticsearch-dsl.readthedocs.io/en/latest/persistence.html#analysis
[4]: https://discuss.elastic.co/t/synonym-using-a-file-is-not-working-malformed-input-exception/60487
[es_flex]: https://github.com/prashnts/dj-elasticsearch-flex
[travis-img-url]: https://travis-ci.org/prashnts/elasticsearch-synonyms.svg?branch=master
[travis-href]: https://travis-ci.org/prashnts/elasticsearch-synonyms
[pypi-href]: https://pypi.python.org/pypi/elasticsearch-synonym-toolkit
[pypi-v-shield-url]: https://img.shields.io/pypi/v/elasticsearch-synonym-toolkit.svg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elasticsearch-synonym-toolkit-0.2.2.tar.gz (6.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

elasticsearch_synonym_toolkit-0.2.2-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file elasticsearch-synonym-toolkit-0.2.2.tar.gz.

File metadata

File hashes

Hashes for elasticsearch-synonym-toolkit-0.2.2.tar.gz
Algorithm Hash digest
SHA256 44cd6bb099849cafa2f527e8048e3387e61c06f5e053152646bf3c81467b3316
MD5 ca7590302c1b3b5f6ad5d3090976f4b5
BLAKE2b-256 0fcac5c319fbd5cd7b14754c46d9df9b551f1a705b46213e255884c1c4028e29

See more details on using hashes here.

File details

Details for the file elasticsearch_synonym_toolkit-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for elasticsearch_synonym_toolkit-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0623c82d6979c9fb96359609474cf2edf1d60e8e76f57b5fa6600a39b0dfaf12
MD5 4f3f3e1c92c6ae02bf225663792dbcfb
BLAKE2b-256 c77f7fcb3ecd130f1722cfb0798f9c8e9ab22c201f7cb68e4f27ab05d0cd0fea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page