Skip to main content

CLI tool for inspecting parquet files.

Project description

Parquet-Inspector

A command line tool for inspecting parquet files with PyArrow.

Installation

pip install parquet-inspector

Usage

parquet-inspector: cli tool for inspecting parquet files.

positional arguments:
  {metadata,schema,head,tail,count,validate,to-jsonl,to-parquet}
    metadata            print file metadata
    schema              print data schema
    head                print first n rows (default is 10)
    tail                print last n rows (default is 10)
    count               print number of rows
    validate            validate file
    to-jsonl            convert parquet file to jsonl
    to-parquet          convert jsonl file to parquet

optional arguments:
  -h, --help            show this help message and exit
  -v, --version         show program's version number and exit
  --threads, -t         use threads for reading
  --mmap, -m            use memory mapping for reading

Examples

# Print the metadata of a parquet file
$ pqi metadata my_file.parquet
created_by: parquet-cpp-arrow version 6.0.1
num_columns: 3
num_rows: 2
num_row_groups: 1
format_version: 1.0
serialized_size: 818
# Print the schema of a parquet file
$ pqi schema my_file.parquet
a: list<item: int64>
  child 0, item: int64
b: struct<c: bool, d: timestamp[ms]>
  child 0, c: bool
  child 1, d: timestamp[ms]
# Print the first 5 rows of a parquet file (default is 10)
$ pqi head -n 5 my_file.parquet
{"a": 1, "b": {"c": true, "d": "1991-02-03 00:00:00"}}
{"a": 2, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
# Print the last 5 rows of a parquet file
$ pqi tail -n 5 my_file.parquet
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5 "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6 "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 7 "b": {"c": true, "d": "2019-04-01 00:00:00"}}
# Print the first 5 rows of a parquet file, only reading the column a
$ pqi head -n 5 -c a my_file.parquet
{'a': 1}
{'a': 2}
{'a': 3}
{'a': 4}
{'a': 5}
# Print the first 3 rows that satisfy the condition a > 3
# (filters are defined in disjunctive normal form)
$ pqi head -n 3 -f "[('a', '>', 3)]" my_file.parquet
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5 "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6 "b": {"c": false, "d": "2019-04-01 00:00:00"}}
# Print the number of rows in a parquet file
$ pqi count my_file.parquet
7
# Validate a parquet file
$ pqi validate my_file.parquet
OK
# Convert a parquet file to jsonl
$ pqi to-jsonl my_file.parquet
$ cat my_file.jsonl
{"a": 1, "b": {"c": true, "d": "1991-02-03 00:00:00"}}
{"a": 2, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 7, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
# Convert a jsonl file to parquet
$ pqi to-parquet my_file.jsonl
$ pqi head my_file.parquet
{"a": 1, "b": {"c": true, "d": "1991-02-03 00:00:00"}}
{"a": 2, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 3, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 4, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 5, "b": {"c": true, "d": "2019-04-01 00:00:00"}}
{"a": 6, "b": {"c": false, "d": "2019-04-01 00:00:00"}}
{"a": 7, "b": {"c": true, "d": "2019-04-01 00:00:00"}}

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquet-inspector-0.1.1.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

parquet_inspector-0.1.1-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file parquet-inspector-0.1.1.tar.gz.

File metadata

  • Download URL: parquet-inspector-0.1.1.tar.gz
  • Upload date:
  • Size: 5.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for parquet-inspector-0.1.1.tar.gz
Algorithm Hash digest
SHA256 63a37955c0c234aef7df40596c00fde14319eda5dc62d553a874930718e9466c
MD5 a17adfd09d0244adcb81193df420e65d
BLAKE2b-256 501752243d3a075ba2b7423abf35dac0e54297c5f836cd8faad056ab62dbaca3

See more details on using hashes here.

File details

Details for the file parquet_inspector-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for parquet_inspector-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 211dc035527466599d3a56b89f304148e4de0c3ab4c1b1a55c1e9ca506d056a1
MD5 4d27e98ee01058f9cc507242848cc1c6
BLAKE2b-256 68bca9ea37fb1240372cba2e757497d81dec89f83c6cdb38e9801435084e106a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page