Skip to main content

Index Migration for ElasticSearch

Project description

https://travis-ci.org/jthi3rry/slingshot.svg?branch=master https://coveralls.io/repos/jthi3rry/slingshot/badge.png?branch=master

Extension for the official ElasticSearch python client providing an indices_manager to create and manage indices with read and write aliases, and perform no-downtime migrations.

Installation

pip install slingshot

Usage

Instantiation

from weakref import proxy
from elasticsearch.client import Elasticsearch
from slingshot.indices_manager import IndicesManagerClient

es = Elasticsearch()
es.indices_manager = IndicesManagerClient(proxy(es))

Creation of a Managed Index

es.indices_manager.create('slingshot', body={"settings": {"number_of_shards": 1, "number_of_replicas": 1}})

This creates an index with read and write aliases:

  • Creates the index “slingshot.{creation_timestamp}”

  • Creates a read alias “slingshot”

  • Creates a write alias “slingshot.write”

Upgrading an Existing Index

Slingshot manages the read and write aliases for the indices it creates. However, you can upgrade an index that was not created with slingshot. It will simply create a write alias to handle migrations.

es.indices_manager.manage('existing_index')

Migration of a Managed Index

es.indices_manager.migrate('slingshot', body={"settings": {"number_of_shards": 5, "number_of_replicas": 1}})

This allows to perform changes to an index and migrate documents to take advantage of new mappings:

  • creates a new index “slingshot.{modification_timestamp}” with a new configuration (e.g. 5 shards instead of 1)

  • swaps write alias to the new index

  • scans and bulk imports all documents (optionally ignoring types or performing transformations)

  • swaps read alias

  • deletes original index (can be skipped)

Note that the index must be created or upgraded with slingshot (by creating a write alias or using the manage method)

Transforming Documents

When migrating, it can be useful to transform documents to match a new mapping.

def transform_my_docs(doc):
    # recompute some fields
    doc['_source']['discount'] = doc['_source']['price'] / doc['_source']['value'] * 100.0

    # drop some fields
    doc['_source'].pop('useless')

    # drop documents based on some business rules (assumes the field is first cast to a datetime)
    if doc['_source]]['expires_at'] < datetime.now():
        return None

    # Don't forget to return the modified document
    return doc

es.indices_manager.migrate('slingshot', body=config_dict_or_string, transform=transform_my_docs)

Ignoring Document Types

It can also be useful to ignore some document types altogether.

es.indices_manager.migrate('slingshot', body=config_dict_or_string, ignore_types=["my_type_1", "my_type_2"])

Keeping the Source Index

If for any reason you wish to keep the original index (e.g. to rollback in case anything goes wrong) after the migration:

es.indices_manager.migrate('slingshot', body=config_dict_or_string, keep_source=True)

Warning

Slingshot is unable to predict what needs to be done with the settings, mappings, aliases, etc. of the new index.

Therefore, when migrating, body must contain all the relevant configuration to create an index from scratch. This can include settings, mappings, aliases, warmers or anything supported by the elasticsearch index API.

However, slingshot manages the migration of the write alias and the read alias (if it exists).

Running Tests

Get a copy of the repository:

git clone git@github.com:OohlaLabs/slingshot.git .

Install tox:

pip install tox

Run the tests:

tox

Contributions

All contributions and comments are welcome. Simply create a pull request or report a bug.

Changelog

v0.0.5

  • Reindex percolators after migrating data

v0.0.4

  • Allow passing create and copy kwargs to migrate

v0.0.3

  • Fix compatibility issues with latest versions of elasticsearch-py (<2.0.0)

  • Add support for parallel_bulk when migrating/copying

  • Reindex percolators when migrating/copying

v0.0.2

  • Fix six requirement to minimum version instead of exact version

v0.0.1

  • Initial

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slingshot-0.0.5.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

slingshot-0.0.5-py2.py3-none-any.whl (7.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file slingshot-0.0.5.tar.gz.

File metadata

  • Download URL: slingshot-0.0.5.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for slingshot-0.0.5.tar.gz
Algorithm Hash digest
SHA256 4223f3e2bcd40c1e7c8c183bc5fc13e835ed66be0e8d8737358cda949bccc524
MD5 6396ad079a21867aa10752de15b828bb
BLAKE2b-256 08d39dd34fefa04ab98a3e46f6bcf5af5baadc1f71c1a2a30b2ebcc2374f93b4

See more details on using hashes here.

File details

Details for the file slingshot-0.0.5-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for slingshot-0.0.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b59a7468fe021199fbea12c32790389f74511af009e13fd6bfa6cd5b08043e4d
MD5 636f2507874f9f6fcf443ecedcd21709
BLAKE2b-256 a33880e67b06fc22dfe5c2ba2a12580c9808e2c5d390255062235b34409ee45f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page