Skip to main content

A CLI client for exporting elasticsearch data to csv

Project description

Build Status Latest Version Python versions Package status Package coverage

This project is to just have a simple cli command to export data from ES using the CPU’s, and Elasticsearch’s Sliced Scroll Search for fetching large datasets. It’s intended to be used in Data workflow for extracting data out.

The performance seems better when no_of_workers == no_of_shards_for_the_index.

Note

This is still early in the development and a bit rough around the edges. Any bug reports, feature suggestions, etc are greatly appreciated. :)

Installation and usage

Installation Since this is a Python package available on PyPi you can install it like any other Python package.

# on modern systems with Python you can install with pip
$ pip install parallel-es2csv
# on older systems you can install using easy_install
$ easy_install parallel-es2csv

Usage The commands should be mostly self-documenting in how they are defined, which is made available through the help command.

$ parallel-es2csv
usage: parallel-es2csv -u <elasticsearch_url> -i <[list_of_index]> [-n <no_of_workers>] [-o <output_folder>]

arguments:
  -h, --help            show this help message and exit
  -i INDICES [INDICES ...], --indices INDICES [INDICES ...]
                        ES indices to export.
  -u URL, --url URL     Elasticsearch host URL. Default is
                        http://localhost:9200.
  -a AUTH, --auth AUTH  Elasticsearch basic authentication in the form of
                        username:pwd.
  -D DOC_TYPE [DOC_TYPE ...], --doc_types DOC_TYPE [DOC_TYPE ...]
                        Document type(s).
  -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
                        Output folder path.
  -f FIELDS [FIELDS ...], --fields FIELDS [FIELDS ...]
                        List of selected fields in output. Default is
                        ['_all'].
  -m INTEGER, --max INTEGER
                        Maximum number of results to return. Default is 0.
  -s INTEGER, --scroll_size INTEGER
                        Scroll size for each batch of results. Default is 100.
  -t INTEGER, --timeout INTEGER
                        Timeout in seconds. Default is 60.
  -e, --meta_fields     Add meta-fields in output.
  -n NO_OF_WORKERS, --no_of_workers NO_OF_WORKERS
                        No. or parallel scroll from Elasticsearch, using
                        Multiprocess
  -v, --version         Show version and exit.
  --debug               Debug mode on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel-es2csv-0.1.8.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

parallel_es2csv-0.1.8-py2-none-any.whl (9.9 kB view details)

Uploaded Python 2

File details

Details for the file parallel-es2csv-0.1.8.tar.gz.

File metadata

File hashes

Hashes for parallel-es2csv-0.1.8.tar.gz
Algorithm Hash digest
SHA256 f865e3bdedf164ac49476772c6588b257b1ef7b5f0e38ea037149ae7ab3fda47
MD5 e43e1738cd7c83689ad446c679ded8c4
BLAKE2b-256 0529797ee7917c5ebbadf8e00e2dea6920044161e1ac716885b89ef5d7579f41

See more details on using hashes here.

File details

Details for the file parallel_es2csv-0.1.8-py2-none-any.whl.

File metadata

File hashes

Hashes for parallel_es2csv-0.1.8-py2-none-any.whl
Algorithm Hash digest
SHA256 2b444b99e53b4e185d9a88ff56b29b20be6ea8ea9b36b8b68732e9660c1aed54
MD5 5d85729547c47e87fc2e00ae0c9b4f6a
BLAKE2b-256 a45de31684e41f655e47022914628a8d419cd142a59c6262203f3c3b94b16a11

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page