CLI tool for inspecting parquet files.
Project description
Parquet-Inspector
A command line tool for inspecting parquet files with PyArrow.
Installation
pip install parquet-inspector
Usage
parquet-inspector: cli tool for inspecting parquet files.
positional arguments:
{metadata,schema,head,tail,count,validate,to-jsonl,to-parquet}
metadata print file metadata
schema print data schema
head print first n rows (default is 10)
tail print last n rows (default is 10)
count print number of rows
validate validate file
to-jsonl convert parquet file to jsonl
to-parquet convert jsonl file to parquet
optional arguments:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--threads, -t use threads for reading
--mmap, -m use memory mapping for reading
Examples
# Print the metadata of a parquet file
$ pqi metadata my_file.parquet
created_by: parquet-cpp-arrow version 6.0.1
num_columns: 3
num_rows: 2
num_row_groups: 1
format_version: 1.0
serialized_size: 818
# Print the schema of a parquet file
$ pqi schema my_file.parquet
a: list<item: int64>
child 0, item: int64
b: struct<c: bool, d: timestamp[ms]>
child 0, c: bool
child 1, d: timestamp[ms]
# Print the first 5 rows of a parquet file (default is 10)
$ pqi head -n 5 my_file.parquet
{"a": 1, "b": {"c": true, "d": "1991-02-03 00:00:00"}}
{"a": 2, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
# Print the last 5 rows of a parquet file
$ pqi tail -n 5 my_file.parquet
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5 "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6 "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 7 "b": {"c": true, "d": "2019-04-01 00:00:00"}}
# Print the first 5 rows of a parquet file, only reading the column a
$ pqi head -n 5 -c a my_file.parquet
{'a': 1}
{'a': 2}
{'a': 3}
{'a': 4}
{'a': 5}
# Print the first 3 rows that satisfy the condition a > 3
# (filters are defined in disjunctive normal form)
$ pqi head -n 3 -f "[('a', '>', 3)]" my_file.parquet
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5 "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6 "b": {"c": false, "d": "2019-04-01 00:00:00"}}
# Print the number of rows in a parquet file
$ pqi count my_file.parquet
7
# Validate a parquet file
$ pqi validate my_file.parquet
OK
# Convert a parquet file to jsonl
$ pqi to-jsonl my_file.parquet
$ cat my_file.jsonl
{"a": 1, "b": {"c": true, "d": "1991-02-03 00:00:00"}}
{"a": 2, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 7, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
# Convert a jsonl file to parquet
$ pqi to-parquet my_file.jsonl
$ pqi head my_file.parquet
{"a": 1, "b": {"c": true, "d": "1991-02-03 00:00:00"}}
{"a": 2, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 7, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parquet-inspector-0.1.1.tar.gz
.
File metadata
- Download URL: parquet-inspector-0.1.1.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63a37955c0c234aef7df40596c00fde14319eda5dc62d553a874930718e9466c |
|
MD5 | a17adfd09d0244adcb81193df420e65d |
|
BLAKE2b-256 | 501752243d3a075ba2b7423abf35dac0e54297c5f836cd8faad056ab62dbaca3 |
File details
Details for the file parquet_inspector-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: parquet_inspector-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 211dc035527466599d3a56b89f304148e4de0c3ab4c1b1a55c1e9ca506d056a1 |
|
MD5 | 4d27e98ee01058f9cc507242848cc1c6 |
|
BLAKE2b-256 | 68bca9ea37fb1240372cba2e757497d81dec89f83c6cdb38e9801435084e106a |