Skip to main content

Python solr/elasticsearch migration script

Project description

solr2es

Circle CI

Migration script from solr to elasticsearch via a queue, that could be either redis or postgresql.

CLI

Here are the option to use as a command line :

  • -m | –migrate : to migrate from a solr index to an elasticsearch index
  • -r | –resume : to resume from a given queue to an elasticsearch index. By default, the queue will be redis. If the parameter “postgresqldsn” is set, the queue will be postgresql.
  • -d | –dump : to dump from a solr index into a queue. By default, the queue will be redis. If the parameter “postgresqldsn” is set, the queue will be postgresql.
  • -t | –test : to test the solr and elasticsearch connections
  • -a | –async : to use python 3 asyncio
  • –solrhost : to set solr host (by default: ‘solr’)
  • –solrfq: to set solr filter query (by default: ‘*’)
  • –solrid: to set solr id field name (by default: ‘id’)
  • –core: to set solr core name (by default: ‘solr2es’)
  • –index: to set index name for solr and elasticsearch (by default: solr core name, see –core parameter)
  • –redishost: to set redis host (by default: ‘redis’)
  • –postgresqldsn: to set postgresql Data Source Name (by default: None, by example: ‘dbname=solr2es user=test password=test host=postgresql’)
  • –eshost: to set elasticsearch host (by default: ‘elasticsearch’)
  • –translationmap: dict string or file path (starting with @) to translate fields from queue into elasticsearch (by default: None, by example: ‘{“postgresql_field”: {“name”: “es_field”}}’)
  • –esmapping: dict string or file path (starting with @) to set elasticsearch mapping (by default: None)
  • –essetting: dict string or file path (starting with @) to set elasticsearch setting (by default: None)
solr2es process

Use

translation_map

The purpose of a translation_map is to create a mapping between the fields coming from Solr to the ones inserted to Elasticsearch.

  1. If a field from Solr doesn’t exist in the translation_map, it will be inserted as it is into Elasticsearch.
  2. Use the property name to rename a field in Elasticsearch :
{"solr_name": {"name": "es_name"}}
  1. Use the property default if you want to set a default value into a field in Elasticsearch.

If the field exists into solr and has a value, it won’t be changed by the translation_map. Otherwise a field solr_name willl be added to Elasticsearch with value john doe.

{"solr_name": {"default": "john doe"}}
  1. Use the property name with some . in it to create a nested field in Elasticsearch.

If the Solr record has a field nested_a_b, the Elasticsearch record will get a field nested, that will have a nested field a, that will have a nested field b that will get the content of nested_a_b.

{"nested_a_b": {"name": "nested.a.b"}}

5. Use the property name with some regex groups capture to rename a bulk of Solr fields in Elasticsearch. This will rename all the fields prefixed by solr_ into elasticsearch_.

{"solr_(.*)": {"name": "elasticsearch_\\1"}}

Develop

To build and run tests you can make :

virtualenv --python=python3.6 venv
source venv/bin/activate
python setup.py develop
python setup.py test

To release :

python setup.py  sdist bdist_egg upload

Misc

Some features are not implemented yet : - Resume from the redis queue to elasticsearch in asynchronous mode (function aioresume_from_redis) - Resume from the redis queue to elasticsearch in synchronous mode (function resume_from_redis) - Resume from the postgresql queue to elasticsearch in synchronous mode (function resume_from_postgresql)

Changes

v. 0.5

  • adds postgresql resume
  • elasticsearch : adds mappings and settings support
  • better logs and progress marks
  • doc : README
  • translation map : support for empty default list
  • adds postgresql blocking queue
  • translation map : ignore field
  • translation map : default value

v. 0.4

  • [solr2es] wildcard support in translation_map
  • [solr2es] nested fields support in translation_map
  • [solr2es] adds solrid parameter to change sort field
  • [solr2es] adds solrfq parameter to parallelize solr reading

v. 0.3

  • [solr2es] adds translation map for fields
  • [solr2es] adds elasticsearch mapping for index creation
  • [test] compatible with 6.6.0

v. 0.2

  • [log] adds logger and progression feedbacks
  • [cli] exit if no args

v. 0.1

  • [solr2es] initial version

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for solr2es, version 0.5
Filename, size File type Python version Upload date Hashes
Filename, size solr2es-0.5-py3.6.egg (21.1 kB) File type Egg Python version 3.6 Upload date Hashes View
Filename, size solr2es-0.5.tar.gz (9.1 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page