Skip to main content

A pythonic tool for batch loading data files (json, parquet, csv, tsv) into ElasticSearch

Project description

Main features

  • Batch upload CSV (actually any *SV) files to Elasticsearch

  • Batch upload JSON files / JSON lines to Elasticsearch

  • Batch upload parquet files to Elasticsearch

  • Pre defining custom mappings

  • Delete index before upload

  • Index documents with _id from the document itself

  • Load data directly from url

  • SSL and basic auth

  • Unicode Support ✌️

Plugins

In order to install plugin, simply run pip install plugin-name - esl-redis - Read continuously from a redis list(s) and index to elasticsearch - esl-s3 - Plugin for listing and indexing files from S3

Test matrix

python / es

5.6.16

6.8.0

7.1.1

2.7

V

V

V

3.7

V

V

V

Installation

pip install elasticsearch-loader
In order to add parquet support run ``pip install elasticsearch-loader[parquet]``

Usage

(venv)/tmp $ elasticsearch_loader --help
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --config-file TEXT          Load default configuration file from esl.yml
  --bulk-size INTEGER             How many docs to collect before writing to
                                  Elasticsearch (default 500)
  --es-host TEXT                  Elasticsearch cluster entry point. (default
                                  http://localhost:9200)
  --verify-certs                  Make sure we verify SSL certificates
                                  (default false)
  --use-ssl                       Turn on SSL (default false)
  --ca-certs TEXT                 Provide a path to CA certs on disk
  --http-auth TEXT                Provide username and password for basic auth
                                  in the format of username:password
  --index TEXT                    Destination index name  [required]
  --delete                        Delete index before import? (default false)
  --update                        Merge and update existing doc instead of
                                  overwrite
  --progress                      Enable progress bar - NOTICE: in order to
                                  show progress the entire input should be
                                  collected and can consume more memory than
                                  without progress bar
  --type TEXT                     Docs type. TYPES WILL BE DEPRECATED IN APIS
                                  IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
                                  IN 8.  [required]
  --id-field TEXT                 Specify field name that be used as document
                                  id
  --as-child                      Insert _parent, _routing field, the value is
                                  same as _id. Note: must specify --id-field
                                  explicitly
  --with-retry                    Retry if ES bulk insertion failed
  --index-settings-file FILENAME  Specify path to json file containing index
                                  mapping and settings, creates index if
                                  missing
  --timeout FLOAT                 Specify request timeout in seconds for
                                  Elasticsearch client
  --encoding TEXT                 Specify content encoding for input files
  --keys TEXT                     Comma separated keys to pick from each
                                  document
  -h, --help                      Show this message and exit.

Commands:
  csv
  json     FILES with the format of [{"a": "1"}, {"b": "2"}]
  parquet
  redis
  s3

Examples

Load 2 CSV to elasticsearch

elasticsearch_loader --index incidents --type incident csv file1.csv file2.csv

Load JSONs to elasticsearch

elasticsearch_loader --index incidents --type incident json *.json

Load all git commits into elasticsearch

git log --pretty=format:'{"sha":"%H","author_name":"%aN", "author_email": "%aE","date":"%ad","message":"%f"}' | elasticsearch_loader --type git --index git json --json-lines -

Load parquet to elasticsearch

elasticsearch_loader --index incidents --type incident parquet file1.parquet

Load CSV from github repo (actually any http/https is ok)

elasticsearch_loader --index data --type avg_height --id-field country json https://raw.githubusercontent.com/samayo/country-data/master/src/country-avg-male-height.json

Load data from stdin

generate_data | elasticsearch_loader --index data --type incident csv -

Read id from incident_id field

elasticsearch_loader --id-field incident_id --index incidents --type incident csv file1.csv file2.csv

Load custom mappings

elasticsearch_loader --index-settings-file samples/mappings.json --index incidents --type incident csv file1.csv file2.csv

Tests and sample data

End to end and regression tests are located under test directory and can run by running ./test.py Input formats can be found under samples

Stargazers over time

Stargazers over time

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

elasticsearch-loader-0.6.0.tar.gz (7.9 kB view details)

Uploaded Source

File details

Details for the file elasticsearch-loader-0.6.0.tar.gz.

File metadata

File hashes

Hashes for elasticsearch-loader-0.6.0.tar.gz
Algorithm Hash digest
SHA256 43c40caedd22327234541e84ba2d31bc62ba6be41cfb05efdef9773275cd98d6
MD5 ecba0db9964317c4ab73ffbe9f8cfa1d
BLAKE2b-256 a3340c7cbf1d76c9a39794543ba470d843d899496ed3db397a2ac18db6c8ff9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page