Skip to main content

`opensearch-reindex` is a Python library that serves to help streamline reindexing data from one OpenSearch index to another.

Project description

opensearch-reindexer

Maintenance codecov

opensearch-reindexer is a Python library that serves to help streamline reindexing data from one OpenSearch index to another using either the native OpenSearch Reindex API or Python, the OpenSearch Scroll API and Bulk inserts.

Features

  • Native OpenSearch Reindex API and Python based reindexing using OpenSearch Scroll API
  • Migrate data from one index to another in the same cluster
  • Migrate data from one index to another in a different cluster
  • Migration history
  • Run multiple migrations one after another
  • Transform documents using native OpenSearch Reindex API or Python using Scoll API and Bulk inserts
  • Source indices/data is never modified or removed

Getting started

1. Install opensearch-reindexer

pip install opensearch-reindexer

or

poetry add opensearch-reindexer

2. Initialize project

reindexer init

3. Configure your source_client in ./migrations/env.py

You only need to configure destination_client if you are migrating data from one cluster to another.

4. Create reindexer_version index

reindexer init-index

This will use your source_client to create a new index named 'reindexer_version' and insert a new document specifying the revision version. {"versionNum": 0}. reindexer_version is used to keep track of which revisions have been run.

When reindexing from one cluster to another, migrations should be run first (step 8) before initializing the destination cluster with: reindexer init-index

5. Create revision (repeat if you have multiple indices)

Two revision types are supported, painless which uses the native OpenSearch Reindex API, and python which using the OpenSearch Scroll API and Bulk inserts. painless revisions are recommended as they are more performant than python revisions. You don't have to use one or the other; ./migrations/versions/ can contain a combination of both painless and python revisions.

To create a painless revision run:

reindexer revision 'my revision name'

To create a python revision run:

reindexer revision 'my revision name' --language python

This will create a new revision file in ./migrations/versions.

Note:

  1. revision files should not be removed and their names should not be changed once created.
  2. ./migration/migration_template_painless.py and ./migration/migration_template_python.py are referenced for each revision. You can modify them if you find yourself making the same changes to revision files.

6. Modify your revision file

Navigate to your revision file ./migrations/versions/1_my_revision_name.py

Painless

Modify source and destination in REINDEX_BODY, you can optionally set DESTINATION_MAPPINGS.

Note: If you only want to create the index, set the source index to None e.g. "source": {"index": "reindexer_revision_1"},

To transform data as data is reindexed, you can use painless scripts. For example, the following will convert data in field "c" from an object to a JSON string before inserting it into index destination.

REINDEX_BODY = {
    "source": {"index": "reindexer_revision_1"},
    "dest": {"index": "reindexer_revision_2"},
    "script": {
        "lang": "painless",
        "source": """
        def jsonString = '{';
        int counter = 1;
        int size = ctx._source.c.size();
        for (def entry : ctx._source.c.entrySet()) {
          jsonString += '"'+entry.getKey()+'":'+'"'+entry.getValue()+'"';
          if (counter != size) {
            jsonString += ',';
          }
          counter++;
        }
        jsonString += '}';
        ctx._source.c = jsonString;
        """
    }
}

For more information on REINDEX_BODY see https://opensearch.org/docs/latest/opensearch/reindex-data/

Python

Modify SOURCE_INDEX and DESTINATION_INDEX, you can optionally set DESTINATION_MAPPINGS.

Note: If you only want to create the index, set the source index to None e.g. "source": {"index": "reindexer_revision_1"},

To modify documents as they are being re-indexed to the destination index, update def transform_document. For example:

class Migration(BaseMigration):
    def transform_document(self, doc: dict) -> dict:
        # Modify this method to transform each document before being inserted into destination index.
        import json
        doc['c'] = json.dumps(doc['c'])
        return doc

7. See an ordered list of revisions that have not be executed

reindexer list

8. Run your migrations

reindexer run

Note: When reindexer run is executed, it will compare revision versions in ./migrations/versions/... to the version number in reindexer_version index of the source cluster. All revisions that have not been run will be run one after another.

FAQ 💬 🙋

How do I start using OpenSearch reindexer in a new project?

To start using OpenSearch reindexer, simply follow the steps outlined in the getting started guide.

What happens if multiple revisions need to be executed?

OpenSearch reindexer compares the remote version in the reindexer_version index on your OpenSearch cluster to your local version. Any versions that have not been executed will be executed one after another.

How to handle multiple indices?

Create a revision for each index and follow the same steps as you would for a single index.

How do I migrate from another schema management tool to OpenSearch reindexer?

To migrate to OpenSearch reindexer, follow steps 1-6 in the getting started guide, repeating steps 5 and 6 for each index. Set the source index to None during step 6 to create the destination index if it doesn't exist, or if it already exists, proceed to the next revision.

Downloading a project that uses OpenSearch reindexer

If the reindexer_version index on the OpenSearch cluster is up-to-date, running reindexer run won't do anything. However, if the OpenSearch cluster hasn't been initialized, run reindexer init-index followed by reindexer run to create and initialize the reindexer_version index and run all migrations.

Reindexing data from one OpenSearch cluster to another

Follow the same steps for reindexing data to the same cluster, but update the "destination_client" in ./migrations/env.py.

Once you have reindexed all indices from one cluster to another, update the source and destination clients.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opensearch-reindexer-2.0.0.dev8.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

opensearch_reindexer-2.0.0.dev8-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file opensearch-reindexer-2.0.0.dev8.tar.gz.

File metadata

  • Download URL: opensearch-reindexer-2.0.0.dev8.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.9.12 Darwin/21.6.0

File hashes

Hashes for opensearch-reindexer-2.0.0.dev8.tar.gz
Algorithm Hash digest
SHA256 6c98e9746ea3f6d7a2ae9a0e234a02a4a6d0e361da169119c0933f61a1e7211f
MD5 d29827f8445d627a2a29d4a8d4a30830
BLAKE2b-256 abd242f7520cc0968f02faddf5a3157fca195689315d722762c2ebf098d3481d

See more details on using hashes here.

File details

Details for the file opensearch_reindexer-2.0.0.dev8-py3-none-any.whl.

File metadata

File hashes

Hashes for opensearch_reindexer-2.0.0.dev8-py3-none-any.whl
Algorithm Hash digest
SHA256 c80e0c917f23dedbcba653d43d60c74c0b0edfd1021e24d4a2164ce6db28f0c3
MD5 7024dcd73f3e5ad7ef96000306e1523a
BLAKE2b-256 6f76072e7f748c2fe825602903144cff1abb9b61b3f3a4a29fcc0317f9248599

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page