A pythonic tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Project description
elasticsearch\_loader |Build Status| |Can I Use Python 3?| |PyPI version|
=========================================================================
Main features:
~~~~~~~~~~~~~~
- Batch upload CSV (actually any \*SV) files to Elasticsearch
- Batch upload JSON files / JSON lines to Elasticsearch
- Batch upload parquet files to Elasticsearch
- Pre defining custom mappings
- Delete index before upload
- Index documents with \_id from the document itself
- Load data directly from url
- SSL and basic auth
Test matrix
~~~~~~~~~~~
+---------------+---------+---------+---------+
| python / es | 2.4.6 | 5.6.5 | 6.1.1 |
+===============+=========+=========+=========+
| 2.7 | ✅ | ✅ | ✅ |
+---------------+---------+---------+---------+
| 3.6 | ✅ | ✅ | ✅ |
+---------------+---------+---------+---------+
Installation
~~~~~~~~~~~~
| ``pip install elasticsearch-loader``
| *In order to add parquet support run
``pip install elasticsearch-loader[parquet]``*
Usage
~~~~~
::
(venv)/tmp $ elasticsearch_loader --help
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...
Options:
-c, --config-file TEXT Load default configuration file from esl.yml
--bulk-size INTEGER How many docs to collect before writing to
ElasticSearch (default 500)
--es-host TEXT Elasticsearch cluster entry point. (default
http://localhost:9200)
--verify-certs Make sure we verify SSL certificates
(default false)
--use-ssl Turn on SSL (default false)
--ca-certs TEXT Provide a path to CA certs on disk
--http-auth TEXT Provide username and password for basic auth
in the format of username:password
--index TEXT Destination index name [required]
--delete Delete index before import? (default false)
--progress Enable progress bar - NOTICE: in order to
show progress the entire input should be
collected and can consume more memory than
without progress bar
--type TEXT Docs type [required]
--id-field TEXT Specify field name that be used as document
id
--as-child Insert _parent, _routing field, the value is
same as _id. Note: must specify --id-field
explicitly
--with-retry Retry if ES bulk insertion failed
--index-settings-file FILENAME Specify path to json file containing index
mapping and settings, creates index if
missing
-h, --help Show this message and exit.
Commands:
csv
json FILES with the format of [{"a": "1"}, {"b": "2"}]
parquet
Examples
~~~~~~~~
Load 2 CSV to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv``
Load JSONs to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index incidents --type incident json *.json``
Load all git commits into elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -``
Load parquet to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index incidents --type incident parquet file1.parquet``
Load CSV from github repo (actually any http/https is ok)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json``
Load data from stdin
^^^^^^^^^^^^^^^^^^^^
``generate_data | elasticsearch_loader --index data --type incident csv -``
Read \_id from incident\_id field ``elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Load custom mappings
^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv``
Tests and sample data
~~~~~~~~~~~~~~~~~~~~~
End to end and regression tests are located under test directory and can
run by runnig ``./test.py`` Input formats can be found under samples
.. |Build Status| image:: https://travis-ci.org/moshe/elasticsearch_loader.svg?branch=master
:target: https://travis-ci.org/moshe/elasticsearch_loader
.. |Can I Use Python 3?| image:: https://caniusepython3.com/project/elasticsearch-loader.svg
:target: https://caniusepython3.com/project/elasticsearch-loader
.. |PyPI version| image:: https://badge.fury.io/py/elasticsearch_loader.svg
:target: https://pypi.python.org/pypi/elasticsearch-loader
=========================================================================
Main features:
~~~~~~~~~~~~~~
- Batch upload CSV (actually any \*SV) files to Elasticsearch
- Batch upload JSON files / JSON lines to Elasticsearch
- Batch upload parquet files to Elasticsearch
- Pre defining custom mappings
- Delete index before upload
- Index documents with \_id from the document itself
- Load data directly from url
- SSL and basic auth
Test matrix
~~~~~~~~~~~
+---------------+---------+---------+---------+
| python / es | 2.4.6 | 5.6.5 | 6.1.1 |
+===============+=========+=========+=========+
| 2.7 | ✅ | ✅ | ✅ |
+---------------+---------+---------+---------+
| 3.6 | ✅ | ✅ | ✅ |
+---------------+---------+---------+---------+
Installation
~~~~~~~~~~~~
| ``pip install elasticsearch-loader``
| *In order to add parquet support run
``pip install elasticsearch-loader[parquet]``*
Usage
~~~~~
::
(venv)/tmp $ elasticsearch_loader --help
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...
Options:
-c, --config-file TEXT Load default configuration file from esl.yml
--bulk-size INTEGER How many docs to collect before writing to
ElasticSearch (default 500)
--es-host TEXT Elasticsearch cluster entry point. (default
http://localhost:9200)
--verify-certs Make sure we verify SSL certificates
(default false)
--use-ssl Turn on SSL (default false)
--ca-certs TEXT Provide a path to CA certs on disk
--http-auth TEXT Provide username and password for basic auth
in the format of username:password
--index TEXT Destination index name [required]
--delete Delete index before import? (default false)
--progress Enable progress bar - NOTICE: in order to
show progress the entire input should be
collected and can consume more memory than
without progress bar
--type TEXT Docs type [required]
--id-field TEXT Specify field name that be used as document
id
--as-child Insert _parent, _routing field, the value is
same as _id. Note: must specify --id-field
explicitly
--with-retry Retry if ES bulk insertion failed
--index-settings-file FILENAME Specify path to json file containing index
mapping and settings, creates index if
missing
-h, --help Show this message and exit.
Commands:
csv
json FILES with the format of [{"a": "1"}, {"b": "2"}]
parquet
Examples
~~~~~~~~
Load 2 CSV to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv``
Load JSONs to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index incidents --type incident json *.json``
Load all git commits into elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -``
Load parquet to elasticsearch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index incidents --type incident parquet file1.parquet``
Load CSV from github repo (actually any http/https is ok)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json``
Load data from stdin
^^^^^^^^^^^^^^^^^^^^
``generate_data | elasticsearch_loader --index data --type incident csv -``
Read \_id from incident\_id field ``elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Load custom mappings
^^^^^^^^^^^^^^^^^^^^
``elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv``
Tests and sample data
~~~~~~~~~~~~~~~~~~~~~
End to end and regression tests are located under test directory and can
run by runnig ``./test.py`` Input formats can be found under samples
.. |Build Status| image:: https://travis-ci.org/moshe/elasticsearch_loader.svg?branch=master
:target: https://travis-ci.org/moshe/elasticsearch_loader
.. |Can I Use Python 3?| image:: https://caniusepython3.com/project/elasticsearch-loader.svg
:target: https://caniusepython3.com/project/elasticsearch-loader
.. |PyPI version| image:: https://badge.fury.io/py/elasticsearch_loader.svg
:target: https://pypi.python.org/pypi/elasticsearch-loader
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Close
Hashes for elasticsearch-loader-0.2.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1689060c0f06931949ce6138e74f568a64434971471bd1230bb708c493675b9e |
|
MD5 | 9afa88fe2be9cca1cbeb9868efeec135 |
|
BLAKE2b-256 | 1cc621d83b9a3b9aa695aeb0092d78f7d5b1aadc0b8d56f6195f79a1c95d2895 |