Easy install parquet-tools
Project description
parquet-tools
This is a pip installable parquet-tools. In other words, parquet-tools is a CLI tools of Apache Arrow. You can show parquet file content/schema on local disk or on Amazon S3. It is incompatible with original parquet-tools.
Features
- Read Parquet data (local file or file on S3)
- Read Parquet metadata/schema (local file or file on S3)
Installation
$ pip install parquet-tools
Usage
$ parquet-tools --help
usage: parquet-tools [-h] {show,csv,inspect} ...
parquet CLI tools
positional arguments:
{show,csv,inspect}
show Show human readble format. see `show -h`
csv Cat csv style. see `csv -h`
inspect Inspect parquet file. see `inspect -h`
optional arguments:
-h, --help show this help message and exit
Usage Examples
Show local parquet file
$ parquet-tools show test.parquet
+-------+-------+---------+
| one | two | three |
|-------+-------+---------|
| -1 | foo | True |
| nan | bar | False |
| 2.5 | baz | True |
+-------+-------+---------+
Show parquet file on S3
$ parquet-tools show s3://bucket-name/prefix/*
+-------+-------+---------+
| one | two | three |
|-------+-------+---------|
| -1 | foo | True |
| nan | bar | False |
| 2.5 | baz | True |
+-------+-------+---------+
Inspect parquet file schema
$ parquet-tools inspect /path/to/parquet
Inspect output
############ file meta data ############
created_by: parquet-cpp version 1.5.1-SNAPSHOT
num_columns: 3
num_rows: 3
num_row_groups: 1
format_version: 1.0
serialized_size: 2226
############ Columns ############
one
two
three
############ Column(one) ############
name: one
path: one
max_definition_level: 1
max_repetition_level: 0
physical_type: DOUBLE
logical_type: None
converted_type (legacy): NONE
############ Column(two) ############
name: two
path: two
max_definition_level: 1
max_repetition_level: 0
physical_type: BYTE_ARRAY
logical_type: String
converted_type (legacy): UTF8
############ Column(three) ############
name: three
path: three
max_definition_level: 1
max_repetition_level: 0
physical_type: BOOLEAN
logical_type: None
converted_type (legacy): NONE
Cat CSV parquet and transform csvq
$ parquet-tools csv s3://bucket-name/test.parquet |csvq "select one, three where three"
+-------+-------+
| one | three |
+-------+-------+
| -1.0 | True |
| 2.5 | True |
+-------+-------+
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
parquet_tools-0.2.13.tar.gz
(28.0 kB
view details)
Built Distribution
File details
Details for the file parquet_tools-0.2.13.tar.gz
.
File metadata
- Download URL: parquet_tools-0.2.13.tar.gz
- Upload date:
- Size: 28.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22451d52dda400ec063d2145a652fbd6bbe28a8b18f23f7dbda77a401f0e6f25 |
|
MD5 | 70781b230e881eaffc978deebd10c14e |
|
BLAKE2b-256 | 98b869e0b7adb2bc9e8c807bce6e1eb5294e24f85986780c4cfb0b36b4492b51 |
File details
Details for the file parquet_tools-0.2.13-py3-none-any.whl
.
File metadata
- Download URL: parquet_tools-0.2.13-py3-none-any.whl
- Upload date:
- Size: 31.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4324ad05d7ef26c4778a23955e11a6a30d0949cad47ee5f608ed27a32e707809 |
|
MD5 | db8f47993b13c67b9bfd90c793ce4f51 |
|
BLAKE2b-256 | 13f959b10cdbae288c0ba46e87f4d04a1abc0278f444aa48998c6779af7541a1 |