Skip to main content

A CLI client for exporting elasticsearch data to csv

Project description

Build Status Latest Version Python versions Package status Package coverage

This project is to just have a simple cli command to export data from ES using the CPU’s, and Elasticsearch’s Sliced Scroll Search for fetching large datasets. It’s intended to be used in Data workflow for extracting data out.

The performance seems better when no_of_workers == no_of_shards_for_the_index.

Note

This is still early in the development and a bit rough around the edges. Any bug reports, feature suggestions, etc are greatly appreciated. :)

Installation and usage

Installation Since this is a Python package available on PyPi you can install it like any other Python package.

# on modern systems with Python you can install with pip
$ pip install parallel-es2csv
# on older systems you can install using easy_install
$ easy_install parallel-es2csv

Usage The commands should be mostly self-documenting in how they are defined, which is made available through the help command.

$ parallel-es2csv
usage: parallel-es2csv -u <elasticsearch_url> -i <[list_of_index]> [-n <no_of_workers>] [-o <output_folder>]

arguments:
  -h, --help            show this help message and exit
  -i INDICES [INDICES ...], --indices INDICES [INDICES ...]
                        ES indices to export.
  -u URL, --url URL     Elasticsearch host URL. Default is
                        http://localhost:9200.
  -a AUTH, --auth AUTH  Elasticsearch basic authentication in the form of
                        username:pwd.
  -D DOC_TYPE [DOC_TYPE ...], --doc_types DOC_TYPE [DOC_TYPE ...]
                        Document type(s).
  -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
                        Output folder path.
  -f FIELDS [FIELDS ...], --fields FIELDS [FIELDS ...]
                        List of selected fields in output. Default is
                        ['_all'].
  -m INTEGER, --max INTEGER
                        Maximum number of results to return. Default is 0.
  -s INTEGER, --scroll_size INTEGER
                        Scroll size for each batch of results. Default is 100.
  -t INTEGER, --timeout INTEGER
                        Timeout in seconds. Default is 60.
  -e, --meta_fields     Add meta-fields in output.
  -n NO_OF_WORKERS, --no_of_workers NO_OF_WORKERS
                        No. or parallel scroll from Elasticsearch, using
                        Multiprocess
  -v, --version         Show version and exit.
  --debug               Debug mode on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel-es2csv-0.1.10.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

parallel_es2csv-0.1.10-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file parallel-es2csv-0.1.10.tar.gz.

File metadata

File hashes

Hashes for parallel-es2csv-0.1.10.tar.gz
Algorithm Hash digest
SHA256 ea90f366350db20887fc01ec0a478e1e7a4f86bd5573ff607f7f92996c1a83a1
MD5 ffac21f223a4d7119028f6b43ffc9986
BLAKE2b-256 5ac5d0cfb337d030f5cd8fb26d278f23f931dd68cdf0e7353a5ae467e38a3d2b

See more details on using hashes here.

File details

Details for the file parallel_es2csv-0.1.10-py3-none-any.whl.

File metadata

File hashes

Hashes for parallel_es2csv-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 cda74d7851455b2ed193b3d2ad4372d92ef7b977dbf92e0dbf8b711b5006ef96
MD5 f9c46d4318c2c05e962fee6cda8a532e
BLAKE2b-256 c3bb25d9447de1330da5ffdff6d8826f8e6fbdbc38cf9d758ff77377d1dad996

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page