Python solr/elasticsearch migration script
Project description
solr2es
Migration script from solr to elasticsearch via a queue, that could be either redis or postgresql.
CLI
Here are the option to use as a command line :
-m | –migrate : to migrate from a solr index to an elasticsearch index
-r | –resume : to resume from a given queue to an elasticsearch index. By default, the queue will be redis. If the parameter “postgresqldsn” is set, the queue will be postgresql.
-d | –dump : to dump from a solr index into a queue. By default, the queue will be redis. If the parameter “postgresqldsn” is set, the queue will be postgresql.
-t | –test : to test the solr and elasticsearch connections
-a | –async : to use python 3 asyncio
–solrhost : to set solr host (by default: ‘solr’)
–solrfq: to set solr filter query (by default: ‘*’)
–solrid: to set solr id field name (by default: ‘id’)
–core: to set solr core name (by default: ‘solr2es’)
–index: to set index name for solr and elasticsearch (by default: solr core name, see –core parameter)
–redishost: to set redis host (by default: ‘redis’)
–postgresqldsn: to set postgresql Data Source Name (by default: None, by example: ‘dbname=solr2es user=test password=test host=postgresql’)
–eshost: to set elasticsearch host (by default: ‘elasticsearch’)
–translationmap: dict string or file path (starting with @) to translate fields from queue into elasticsearch (by default: None, by example: ‘{“postgresql_field”: {“name”: “es_field”}}’)
–esmapping: dict string or file path (starting with @) to set elasticsearch mapping (by default: None)
–essetting: dict string or file path (starting with @) to set elasticsearch setting (by default: None)
Use
translation_map
The purpose of a translation_map is to create a mapping between the fields coming from the queue (either Redis or Postgresql) to the ones inserted to Elasticsearch.
If a field from the queue doesn’t exist in the translation_map, it will be inserted as it is into Elasticsearch.
Use the property name to rename a field in Elasticsearch :
{"queue_name": {"name": "elasticsearch_name"}}
Use the property default if you want to set a default value into a field in Elasticsearch.
If the field exists into the queue and has a value, it won’t be changed by the translation_map. Otherwise a field queue_name willl be added to Elasticsearch with value john doe.
{"queue_name": {"default": "john doe"}}
Use the property name with some . in it, to create a nested field in Elasticsearch.
If the queue record has a field nested_a_b, the Elasticsearch record will get a field nested, that will have a nested field a, that will have a nested field b that will get the content of nested_a_b.
{"nested_a_b": {"name": "nested.a.b"}}
5. Use the property name with some regex groups capture to rename a bulk of queue fields in Elasticsearch by adding [regexp] at the beginning of the field. This will rename all the fields prefixed by queue_ into elasticsearch_.
{"[regexp]queue_(.*)": {"name": "elasticsearch_\\1"}}
Use the property ignore at true to ignore some fields from the queue to Elasticsearch.
{"ignored_field": {"ignore": true}}
Use the property routing_field set to true to use one field for routing in elasticsearch. An exception will be raised if several fields are set to true.
{"my_root_doc": {"routing_field": true}}
Use the property multivalued set to false to ignore multi valued array field. Get the first value instead. By default the array is copied.
{"my_array": {"multivalued": false}}
execution
Execute a dump from Solr into Postgresql specifying the Solr host, the Solr core, the Solr id and the Postgresql DSN
solr2es --postgresqldsn 'dbname=solr2es user=test password=test host=localhost' --solrhost 127.0.0.1 --core test_core --solrid solr_id -d -a
Execute a resume from Postgresql into Elasticsearch specifying the Postgresql DSN, the Elasticsearch index, the Elasticsearch mapping, the Elasticsearch settings and the translation map
solr2es --postgresqldsn 'dbname=solr2es user=test password=test host=localhost' --index es-index --translationmap @examples/translation-map.json --esmapping @examples/datashare_index_mappings.json --essetting @examples/datashare_index_settings.json -r -a
Develop
To build and run tests you can make :
virtualenv --python=python3.6 venv source venv/bin/activate python setup.py develop python setup.py test
To release :
python setup.py sdist bdist_egg upload
Misc
Some features are not implemented yet :
Resume from the redis queue to elasticsearch in asynchronous mode (function aioresume_from_redis)
Resume from the redis queue to elasticsearch in synchronous mode (function resume_from_redis)
Resume from the postgresql queue to elasticsearch in synchronous mode (function resume_from_postgresql)
Changes
v. 0.7
multivalued field : flatten the array if it has one value
multivalued field : ignore multi valuated field in translation map
multivalued field : copy the array into elasticsearch
v. 0.6
error handling : logs ids that have failed when resuming from postgresql
adds a the possibility to specify a routing field in the translation map
v. 0.5
adds postgresql resume
elasticsearch : adds mappings and settings support
better logs and progress marks
doc : README
translation map : support for empty default list
adds postgresql blocking queue
translation map : ignore field
translation map : default value
v. 0.4
[solr2es] wildcard support in translation_map
[solr2es] nested fields support in translation_map
[solr2es] adds solrid parameter to change sort field
[solr2es] adds solrfq parameter to parallelize solr reading
v. 0.3
[solr2es] adds translation map for fields
[solr2es] adds elasticsearch mapping for index creation
[test] compatible with 6.6.0
v. 0.2
[log] adds logger and progression feedbacks
[cli] exit if no args
v. 0.1
[solr2es] initial version
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file solr2es-0.7.tar.gz
.
File metadata
- Download URL: solr2es-0.7.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e3be67054c72e1423f80ae2aa3a9da9d9e82b02cd15c5efe29001b7cc29dbb59 |
|
MD5 | 24cc51cf9dcddd50f79365da5f01f4a8 |
|
BLAKE2b-256 | 6cac645baf4f5e04d6fc73ac8cc1e2ce50515a98b445009d45188839be62a0ef |
File details
Details for the file solr2es-0.7-py3.6.egg
.
File metadata
- Download URL: solr2es-0.7-py3.6.egg
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Python-urllib/3.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f72b63833f4cf1ced4bcc01d1991a064d8a74239fa485c57ab2b755cda4b41b9 |
|
MD5 | 401ef426d9863e74bf104e7548bbc0ff |
|
BLAKE2b-256 | fb1f81f9fa24429d3d07c65b340b42766cd8ee05d684eeacb5ce1f9a0c9c89a4 |