A CLI client for exporting elasticsearch data to csv
This project is to just have a simple cli command to export data from ES using the CPU’s, and Elasticsearch’s Sliced Scroll Search for fetching large datasets. It’s intended to be used in Data workflow for extracting data out.
The performance seems better when no_of_workers == no_of_shards_for_the_index.
This is still early in the development and a bit rough around the edges. Any bug reports, feature suggestions, etc are greatly appreciated. :)
Installation and usage
Installation Since this is a Python package available on PyPi you can install it like any other Python package.
# on modern systems with Python you can install with pip $ pip install parallel-es2csv # on older systems you can install using easy_install $ easy_install parallel-es2csv
Usage The commands should be mostly self-documenting in how they are defined, which is made available through the help command.
$ parallel-es2csv usage: parallel-es2csv -u <elasticsearch_url> -i <[list_of_index]> [-n <no_of_workers>] [-o <output_folder>] arguments: -h, --help show this help message and exit -i INDICES [INDICES ...], --indices INDICES [INDICES ...] ES indices to export. -u URL, --url URL Elasticsearch host URL. Default is http://localhost:9200. -a AUTH, --auth AUTH Elasticsearch basic authentication in the form of username:pwd. -D DOC_TYPE [DOC_TYPE ...], --doc_types DOC_TYPE [DOC_TYPE ...] Document type(s). -o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER Output folder path. -f FIELDS [FIELDS ...], --fields FIELDS [FIELDS ...] List of selected fields in output. Default is ['_all']. -m INTEGER, --max INTEGER Maximum number of results to return. Default is 0. -s INTEGER, --scroll_size INTEGER Scroll size for each batch of results. Default is 100. -t INTEGER, --timeout INTEGER Timeout in seconds. Default is 60. -e, --meta_fields Add meta-fields in output. -n NO_OF_WORKERS, --no_of_workers NO_OF_WORKERS No. or parallel scroll from Elasticsearch, using Multiprocess -v, --version Show version and exit. --debug Debug mode on.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size parallel_es2csv-0.1.10-py3-none-any.whl (7.6 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size parallel-es2csv-0.1.10.tar.gz (6.2 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for parallel_es2csv-0.1.10-py3-none-any.whl