`opensearch-reindex` is a Python library that serves to help streamline reindexing data from one OpenSearch index to another.
Project description
opensearch-reindexer
opensearch-reindexer
is a Python library that serves to help streamline reindexing data from one OpenSearch
index to another using either the native OpenSearch Reindex API or Python, the OpenSearch Scroll API and Bulk inserts.
Features
- Native OpenSearch Reindex API and Python based reindexing using OpenSearch Scroll API
- Migrate data from one index to another in the same cluster
- Migrate data from one index to another in a different cluster
- Revision history
- Run multiple migrations one after another
- Transform documents using native OpenSearch Reindex API or Python using Scoll API and Bulk inserts
- Source indices/data is never modified or removed
Getting started
1. Install opensearch-reindexer
pip install opensearch-reindexer
or
poetry add opensearch-reindexer
2. Initialize project
reindexer init
3. Configure your source_client in ./migrations/env.py
You only need to configure destination_client
if you are migrating data from one cluster to another.
4. Create reindexer_version
index
reindexer init-index
This will use your source_client
to create a new index named 'reindexer_version' and insert a new document specifying the revision version.
{"versionNum": 0}
. reindexer_version
is used to keep track of which revisions have been run.
When reindexing from one cluster to another, migrations should be run first (step 8) before initializing the destination cluster with:
reindexer init-index
5. Create revision
Two revision types are supported, painless
which uses the native OpenSearch Reindex API, and python
which using
the OpenSearch Scroll API and Bulk inserts. painless
revisions are recommended as they are more performant than
python
revisions. You don't have to use one or the other; ./migrations/versions/
can contain a combination of
both painless
and python
revisions.
To create a painless
revision run:
reindexer revision 'my revision name'
To create a python
revision run:
reindexer revision 'my revision name' --language python
This will create a new revision file in ./migrations/versions
.
Note:
- revision files should not be removed and their names should not be changed once created.
./migration/migration_template_painless.py
and./migration/migration_template_python.py
are referenced for each revision. You can modify them if you find yourself making the same changes to revision files.
6. Modify your revision file
Navigate to your revision file ./migrations/versions/1_my_revision_name.py
Painless
Modify source
and destination
in REINDEX_BODY
, you can optionally set DESTINATION_MAPPINGS
.
To transform data as data is reindexed, you can use
painless scripts. For example, the following will convert data in field "c" from an object to a JSON string
before inserting it into index destination
.
REINDEX_BODY = {
"source": {"index": "reindexer_revision_1"},
"dest": {"index": "reindexer_revision_2"},
"script": {
"lang": "painless",
"source": """
def jsonString = '{';
int counter = 1;
int size = ctx._source.c.size();
for (def entry : ctx._source.c.entrySet()) {
jsonString += '"'+entry.getKey()+'":'+'"'+entry.getValue()+'"';
if (counter != size) {
jsonString += ',';
}
counter++;
}
jsonString += '}';
ctx._source.c = jsonString;
"""
}
}
For more information on REINDEX_BODY
see https://opensearch.org/docs/latest/opensearch/reindex-data/
Python
Modify SOURCE_INDEX
and DESTINATION_INDEX
, you can optionally set DESTINATION_MAPPINGS
.
To modify documents as they are being re-indexed to the destination index, update def transform_document
. For example:
class Migration(BaseMigration):
def transform_document(self, doc: dict) -> dict:
# Modify this method to transform each document before being inserted into destination index.
import json
doc['c'] = json.dumps(doc['c'])
return doc
7. See an ordered list of revisions that have not be executed
reindexer list
8. Run your migrations
reindexer run
Note: When reindexer run
is executed, it will compare revision versions in ./migrations/versions/...
to the version number in reindexer_version
index of the source cluster.
All revisions that have not been run will be run one after another.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file opensearch-reindexer-1.0.1.dev1.tar.gz
.
File metadata
- Download URL: opensearch-reindexer-1.0.1.dev1.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.9.12 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c65ff4e5f0975cb0d2dcfb7e3fc43812a5b6b8a2d472f5a9f037ddc32d58414e |
|
MD5 | a32be480394dade95e856674409f0af2 |
|
BLAKE2b-256 | 25510c38df4909b24e7ed3c9be771670a4bdef3bcb8dbc8c47b81afce790612b |
File details
Details for the file opensearch_reindexer-1.0.1.dev1-py3-none-any.whl
.
File metadata
- Download URL: opensearch_reindexer-1.0.1.dev1-py3-none-any.whl
- Upload date:
- Size: 8.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.9.12 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | de3ce3b7303c2738251b72a3444c6c840fcdbf546ae28835922de9774f55893a |
|
MD5 | f6c332f94b79d8fcebc8806248a88793 |
|
BLAKE2b-256 | f01aeda1775918607c6c8e2e447d18d030a816423f236bd3229b0261ca957521 |