A CLI client for exporting elasticsearch data to csv
Project description
This project is to just have a simple cli command to export data from ES using the CPU’s, and Elasticsearch’s Sliced Scroll Search for fetching large datasets. It’s intended to be used in Data workflow for extracting data out.
The performance seems better when no_of_workers == no_of_shards_for_the_index.
Note
This is still early in the development and a bit rough around the edges. Any bug reports, feature suggestions, etc are greatly appreciated. :)
Installation and usage
Installation Since this is a Python package available on PyPi you can install it like any other Python package.
# on modern systems with Python you can install with pip
$ pip install parallel-es2csv
# on older systems you can install using easy_install
$ easy_install parallel-es2csv
Usage The commands should be mostly self-documenting in how they are defined, which is made available through the help command.
$ parallel-es2csv
usage: parallel-es2csv -u <elasticsearch_url> -i <[list_of_index]> [-n <no_of_workers>] [-o <output_folder>]
arguments:
-h, --help show this help message and exit
-i INDICES [INDICES ...], --indices INDICES [INDICES ...]
ES indices to export.
-u URL, --url URL Elasticsearch host URL. Default is
http://localhost:9200.
-a AUTH, --auth AUTH Elasticsearch basic authentication in the form of
username:pwd.
-D DOC_TYPE [DOC_TYPE ...], --doc_types DOC_TYPE [DOC_TYPE ...]
Document type(s).
-o OUTPUT_FOLDER, --output_folder OUTPUT_FOLDER
Output folder path.
-f FIELDS [FIELDS ...], --fields FIELDS [FIELDS ...]
List of selected fields in output. Default is
['_all'].
-m INTEGER, --max INTEGER
Maximum number of results to return. Default is 0.
-s INTEGER, --scroll_size INTEGER
Scroll size for each batch of results. Default is 100.
-t INTEGER, --timeout INTEGER
Timeout in seconds. Default is 60.
-e, --meta_fields Add meta-fields in output.
-n NO_OF_WORKERS, --no_of_workers NO_OF_WORKERS
No. or parallel scroll from Elasticsearch, using
Multiprocess
-v, --version Show version and exit.
--debug Debug mode on.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parallel-es2csv-0.1.10.tar.gz
.
File metadata
- Download URL: parallel-es2csv-0.1.10.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea90f366350db20887fc01ec0a478e1e7a4f86bd5573ff607f7f92996c1a83a1 |
|
MD5 | ffac21f223a4d7119028f6b43ffc9986 |
|
BLAKE2b-256 | 5ac5d0cfb337d030f5cd8fb26d278f23f931dd68cdf0e7353a5ae467e38a3d2b |
File details
Details for the file parallel_es2csv-0.1.10-py3-none-any.whl
.
File metadata
- Download URL: parallel_es2csv-0.1.10-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cda74d7851455b2ed193b3d2ad4372d92ef7b977dbf92e0dbf8b711b5006ef96 |
|
MD5 | f9c46d4318c2c05e962fee6cda8a532e |
|
BLAKE2b-256 | c3bb25d9447de1330da5ffdff6d8826f8e6fbdbc38cf9d758ff77377d1dad996 |