Python solr/elasticsearch migration script
Project description
solr2es
Migration script from solr to elasticsearch via a queue, that could be either redis or postgresql.
CLI
Here are the option to use as a command line :
-m | –migrate : to migrate from a solr index to an elasticsearch index
-r | –resume : to resume from a given queue to an elasticsearch index. By default, the queue will be redis. If the parameter “postgresqldsn” is set, the queue will be postgresql.
-d | –dump : to dump from a solr index into a queue. By default, the queue will be redis. If the parameter “postgresqldsn” is set, the queue will be postgresql.
-t | –test : to test the solr and elasticsearch connections
-a | –async : to use python 3 asyncio
–solrhost : to set solr host (by default: ‘solr’)
–solrfq: to set solr filter query (by default: ‘*’)
–solrid: to set solr id field name (by default: ‘id’)
–core: to set solr core name (by default: ‘solr2es’)
–index: to set index name for solr and elasticsearch (by default: solr core name, see –core parameter)
–redishost: to set redis host (by default: ‘redis’)
–postgresqldsn: to set postgresql Data Source Name (by default: None, by example: ‘dbname=solr2es user=test password=test host=postgresql’)
–eshost: to set elasticsearch host (by default: ‘elasticsearch’)
–translationmap: dict string or file path (starting with @) to translate fields from queue into elasticsearch (by default: None, by example: ‘{“postgresql_field”: {“name”: “es_field”}}’)
–esmapping: dict string or file path (starting with @) to set elasticsearch mapping (by default: None)
–essetting: dict string or file path (starting with @) to set elasticsearch setting (by default: None)
Use
translation_map
The purpose of a translation_map is to create a mapping between the fields coming from Solr to the ones inserted to Elasticsearch.
If a field from Solr doesn’t exist in the translation_map, it will be inserted as it is into Elasticsearch.
Use the property name to rename a field in Elasticsearch :
{"solr_name": {"name": "es_name"}}
Use the property default if you want to set a default value into a field in Elasticsearch.
If the field exists into solr and has a value, it won’t be changed by the translation_map. Otherwise a field solr_name willl be added to Elasticsearch with value john doe.
{"solr_name": {"default": "john doe"}}
Use the property name with some . in it to create a nested field in Elasticsearch.
If the Solr record has a field nested_a_b, the Elasticsearch record will get a field nested, that will have a nested field a, that will have a nested field b that will get the content of nested_a_b.
{"nested_a_b": {"name": "nested.a.b"}}
5. Use the property name with some regex groups capture to rename a bulk of Solr fields in Elasticsearch. This will rename all the fields prefixed by solr_ into elasticsearch_.
{"solr_(.*)": {"name": "elasticsearch_\\1"}}
Develop
To build and run tests you can make :
virtualenv --python=python3.6 venv source venv/bin/activate python setup.py develop python setup.py test
To release :
python setup.py sdist bdist_egg upload
Misc
Some features are not implemented yet : - Resume from the redis queue to elasticsearch in asynchronous mode (function aioresume_from_redis) - Resume from the redis queue to elasticsearch in synchronous mode (function resume_from_redis) - Resume from the postgresql queue to elasticsearch in synchronous mode (function resume_from_postgresql)
Changes
v. 0.5
adds postgresql resume
elasticsearch : adds mappings and settings support
better logs and progress marks
doc : README
translation map : support for empty default list
adds postgresql blocking queue
translation map : ignore field
translation map : default value
v. 0.4
[solr2es] wildcard support in translation_map
[solr2es] nested fields support in translation_map
[solr2es] adds solrid parameter to change sort field
[solr2es] adds solrfq parameter to parallelize solr reading
v. 0.3
[solr2es] adds translation map for fields
[solr2es] adds elasticsearch mapping for index creation
[test] compatible with 6.6.0
v. 0.2
[log] adds logger and progression feedbacks
[cli] exit if no args
v. 0.1
[solr2es] initial version
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.