A pythonic tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch
Project description
Main features:
Batch upload CSV (actually any *SV) files to Elasticsearch
Batch upload JSON files / JSON lines to Elasticsearch
Batch upload parquet files to Elasticsearch
Pre defining custom mappings
Delete index before upload
Index documents with _id from the document itself
Load data directly from url
Supports ES 1.X, 2.X and 5.X
SSL and basic auth
Installation
Usage
(venv)/tmp $ elasticsearch_loader --help Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]... Options: -c, --config-file TEXT Load default configuration file from esl.yml --bulk-size INTEGER How many docs to collect before writing to ElasticSearch (default 500) --es-host TEXT Elasticsearch cluster entry point. (default http://localhost:9200) --verify-certs Make sure we verify SSL certificates (default false) --use-ssl Turn on SSL (default false) --ca-certs TEXT Provide a path to CA certs on disk --http-auth TEXT Provide username and password for basic auth in the format of username:password --index TEXT Destination index name [required] --delete Delete index before import? (default false) --progress Enable progress bar - NOTICE: in order to show progress the entire input should be collected and can consume more memory than without progress bar --type TEXT Docs type [required] --id-field TEXT Specify field name that be used as document id --index-settings-file FILENAME Specify path to json file containing index mapping and settings, creates index if missing -h, --help Show this message and exit. Commands: csv json FILES with the format of [{"a": "1"}, {"b": "2"}] parquet
Examples
Load 2 CSV to elasticsearch
elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv
Load JSONs to elasticsearch
elasticsearch_loader --index incidents --type incident json *.json
Load all git commits into elasticsearch
git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -
Load parquet to elasticsearch
elasticsearch_loader --index incidents --type incident parquet file1.parquet
Load CSV from github repo (actually any http/https is ok)
elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json
Load data from stdin
generate_data | elasticsearch_loader --index data --type incident csv -
Read _id from incident_id field elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv
Load custom mappings
elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv
Tests and sample data
Tests are located under test and can run by runnig tox input format can be found under samples
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for elasticsearch-loader-0.2.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 406362e11b057170dc22252813f80b627fc1f1a39c41336f56441c5b640f8a82 |
|
MD5 | 60046b406a33c187653f469d6b4a076e |
|
BLAKE2b-256 | a1669ca44777a9d74a6850d05b3ff37ffd33f0298270a76080e948d50533c63c |