Skip to main content

A CLI client for exporting elasticsearch data to csv

Project description

Build Status Latest Version Python versions Package status Package coverage

This project is to just have a simple cli command to export data from ES using the CPU’s, and Elasticsearch’s Sliced Scroll Search for fetching large datasets. It’s intended to be used in Data workflow for extracting data out.

The performance seems better when no_of_workers == no_of_shards_for_the_index.

Note

This is still early in the development and a bit rough around the edges. Any bug reports, feature suggestions, etc are greatly appreciated. :)

Installation and usage

Installation Since this is a Python package available on PyPi you can install it like any other Python package.

# on modern systems with Python you can install with pip
$ pip install parallel-es2csv
# on older systems you can install using easy_install
$ easy_install parallel-es2csv

Usage The commands should be mostly self-documenting in how they are defined, which is made available through the help command.

$ parallel-es2csv
usage: parallel-es2csv -u <elasticsearch_url> -i <[list_of_index]> [-n <no_of_workers>] [-o <output_folder>]

arguments:
  -h, --help            show this help message and exit
  -i INDICES [INDICES ...], --indices INDICES [INDICES ...]
                        ES indices to export.
  -u URL, --url URL     Elasticsearch host URL. Default is
                        http://localhost:9200.
  -a AUTH, --auth AUTH  Elasticsearch basic authentication in the form of
                        username:pwd.
  -D DOC_TYPE [DOC_TYPE ...], --doc_types DOC_TYPE [DOC_TYPE ...]
                        Document type(s).
  -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
                        Output folder path.
  -f FIELDS [FIELDS ...], --fields FIELDS [FIELDS ...]
                        List of selected fields in output. Default is
                        ['_all'].
  -m INTEGER, --max INTEGER
                        Maximum number of results to return. Default is 0.
  -s INTEGER, --scroll_size INTEGER
                        Scroll size for each batch of results. Default is 100.
  -t INTEGER, --timeout INTEGER
                        Timeout in seconds. Default is 60.
  -e, --meta_fields     Add meta-fields in output.
  -n NO_OF_WORKERS, --no_of_workers NO_OF_WORKERS
                        No. or parallel scroll from Elasticsearch, using
                        Multiprocess
  -v, --version         Show version and exit.
  --debug               Debug mode on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel-es2csv-0.1.9.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

parallel_es2csv-0.1.9-py2-none-any.whl (9.9 kB view details)

Uploaded Python 2

File details

Details for the file parallel-es2csv-0.1.9.tar.gz.

File metadata

File hashes

Hashes for parallel-es2csv-0.1.9.tar.gz
Algorithm Hash digest
SHA256 d0cc48ad81d8971f2d2c173c76c105dd1156e6a12154094b5b3e6fc08529b369
MD5 c0c782738107a8212e9cdee692766ee6
BLAKE2b-256 a877ae1de0f48448c95b3447bc7590c613f67e18937f242e8b90b0ba1ba4f6f3

See more details on using hashes here.

File details

Details for the file parallel_es2csv-0.1.9-py2-none-any.whl.

File metadata

File hashes

Hashes for parallel_es2csv-0.1.9-py2-none-any.whl
Algorithm Hash digest
SHA256 37f6dd7f6e59fa1d8f99ae73e09a2d3af33a3ef5d93b7649c0eb6ea42070fd7b
MD5 941f0b88632139d0839bf12001275aad
BLAKE2b-256 6c3b60b8664e35c8a26f3dbe8f1406e60f629b1ad0a34edacb5e2cd5c690f89a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page