A powerful command-line tool for inspecting tabular files like Parquet, CSV, and XLSX

These details have not been verified by PyPI

Project links

Project description

parq-cli

A command-line tool for inspecting, transforming, and comparing tabular files.

Overview

parq focuses on the workflows that come up most often when working with .parquet, .csv, .tsv, and .xlsx files:

inspect metadata and schema
preview the first or last rows
count rows
split large files
compute lightweight column stats (with cardinality and top-values for string columns)
convert between supported formats
diff two datasets by key
merge compatible files

The CLI keeps startup light with lazy imports, preserves plain and json output modes for automation, and avoids unnecessary full-table materialization for large CSV/XLSX workflows where possible.

Installation

pip install parq-cli

Enable .xlsx support with the optional dependency:

pip install "parq-cli[xlsx]"

Quick Start

# Inspect metadata
parq meta data.parquet
parq meta --fast data.csv

# Show schema
parq schema data.xlsx

# Preview rows
parq head data.parquet
parq head -n 10 --columns id,name data.csv
parq tail -n 20 data.csv

# Count rows
parq count data.parquet

# Split files
parq split data.csv --record-count 100000 -n "chunks/part-%03d.csv"
parq split data.parquet --file-count 4 -n "chunks/part-%02d.parquet"
parq split data.csv --record-count 100000 -n "out/part-%03d.csv" --force   # overwrite existing

# Column statistics (string columns include cardinality and top values)
parq stats sales.parquet --columns amount,category --limit 10
parq stats sales.parquet --columns category --top-n 10    # show top 10 most frequent values

# Format conversion (with live progress bar)
parq convert raw.xlsx cleaned.parquet
parq convert source.parquet export.csv --columns id,name,status
parq convert source.parquet export.csv --force             # overwrite if exists

# Read TSV files or use a custom delimiter
parq head data.tsv
parq head --delimiter ";" data.csv

# Read a specific XLSX sheet
parq head --sheet Sheet2 report.xlsx
parq head --sheet 1 report.xlsx                            # 0-based index

# Dataset diff
parq diff old.parquet new.parquet --key id --columns status,amount
parq diff left.csv right.csv --key id --summary-only

# Merge compatible inputs (with live progress bar)
parq merge part-001.parquet part-002.parquet merged.parquet
parq merge chunks/*.parquet merged.parquet --force         # overwrite if exists

Supported Formats

Command	Parquet	CSV	TSV	XLSX
`meta`	yes	yes	yes	yes
`schema`	yes	yes	yes	yes
`head` / `tail`	yes	yes	yes	yes
`count`	yes	yes	yes	yes
`split`	yes	yes	yes	yes
`stats`	yes	yes	yes	yes
`convert`	yes	yes	yes	yes
`diff`	yes	yes	yes	no, convert first
`merge`	yes	yes	yes	yes

XLSX support requires openpyxl. TSV files are auto-detected by the .tsv extension; a custom delimiter can be supplied with --delimiter.

Command Reference

`meta`

parq meta FILE
parq meta --fast FILE

Shows file-level metadata such as path, format, column count, file size, row-group count, and when available, row count and Parquet-specific metadata.

Use --fast when you want a cheap metadata pass on CSV/XLSX files. In fast mode, expensive fields such as full row counts are skipped.

`schema`

parq schema FILE

Shows column names, types, and nullable information.

`head` and `tail`

parq head FILE
parq head -n 20 FILE
parq head -n 20 --columns id,name FILE

parq tail FILE
parq tail -n 20 FILE
parq tail -n 20 --columns id,name FILE

Notes:

default preview size is 5
--columns accepts a comma-separated list
missing files return a friendly error with exit code 1
empty header-only CSV/XLSX files return an empty preview with detected columns
an empty csv with no header raises a friendly Empty CSV file error

`count`

parq count FILE

Returns the total row count.

`split`

parq split FILE --file-count N
parq split FILE --record-count N
parq split FILE --record-count 100000 -n "chunks/part-%03d.parquet"
parq split FILE --record-count 100000 -n "chunks/part-%03d.csv" --force

Splits one input file into multiple output files.

Rules:

specify exactly one of --file-count or --record-count
output format is inferred from --name-format
by default, existing target files raise an error; use --force / -F to overwrite
in --record-count mode, CSV/XLSX now stream in a single pass instead of pre-counting the entire file
a live progress bar is shown during the split

`stats`

parq stats FILE
parq stats FILE --columns amount,category
parq stats FILE --limit 20
parq stats FILE --columns category --top-n 10

Computes simple per-column statistics.

numeric columns include count, null_count, min, max, mean
string, boolean, and date columns additionally include cardinality and top_values (top N most frequent values with their occurrence counts)
default --top-n is 5; set to 0 to suppress top-values output entirely
default --limit is 50 to avoid flooding the terminal on very wide tables

`convert`

parq convert SOURCE OUTPUT
parq convert SOURCE OUTPUT --columns id,name,status
parq convert SOURCE OUTPUT --force

Converts a supported input file to another supported output format. The output format is determined by the OUTPUT suffix.

Notes:

current targets are .parquet, .csv, .tsv, and .xlsx
conversion is streaming-based where possible
a live progress bar is shown during the conversion
by default, existing output files raise an error; use --force / -F to overwrite

`diff`

parq diff LEFT RIGHT --key id
parq diff LEFT RIGHT --key id1,id2 --columns status,amount
parq diff LEFT RIGHT --key id --summary-only

Compares two datasets by key and reports:

row count delta
rows only present on the left
rows only present on the right
changed rows for the selected columns
schema-only columns and same-name type mismatches

Notes:

--key is required
diff currently supports Parquet and CSV inputs
XLSX files should be converted first
duplicate keys on either side are treated as an error
--summary-only keeps the counts and omits sample payloads

`merge`

parq merge INPUT1 INPUT2 OUTPUT
parq merge chunks/*.parquet merged.parquet
parq merge chunks/*.parquet merged.parquet --force

Merges multiple compatible input files into a single output file. The last positional argument is the output path.

Notes:

schemas must be identical or safely unifiable by Arrow
by default, existing output files raise an error; use --force / -F to overwrite
output format is inferred from the output suffix
a live progress bar is shown during the merge

Output Modes

Global options:

--version, -v: show version information
--output, -o: select output format (rich | plain | json)
--delimiter, -d: field delimiter for CSV/TSV input (default: ,); .tsv files default to \t automatically
--sheet: XLSX sheet name or 0-based index to read (default: active sheet)
--help: show command help

Available output modes:

rich: human-friendly terminal rendering
plain: low-overhead tabular output for shell pipelines
json: machine-readable structured output

Examples:

parq meta data.parquet --output json
parq --output plain stats data.csv
parq --delimiter ";" head semicolon_data.csv
parq --sheet "Sales" head report.xlsx
parq diff left.parquet right.parquet --key id --summary-only --output json

On Windows terminals that cannot safely render emoji or extended characters, Rich headings automatically fall back to a safe plain style instead of crashing.

Large File Notes

Parquet metadata, row counts, and previews use Arrow metadata and row-group shortcuts where available.
CSV tail uses a fixed-size column window instead of materializing every row as Python dicts.
CSV/XLSX split --record-count streams in one pass.
meta --fast is the best option when you need quick metadata from large CSV/XLSX inputs.
XLSX schema inference samples the first 1000 rows instead of scanning the entire sheet up front.

For repeated heavy workflows, converting large CSV/XLSX files to Parquet is still the best path for throughput.

Development

Install development dependencies:

uv sync --extra dev

or:

pip install -e ".[dev]"

Useful commands:

python -m parq --help
pytest -m "not performance"
pytest tests/test_performance.py -m performance -q -s
ruff check parq tests
ruff check --fix parq tests
pytest --cov=parq --cov-report=html

Status

Implemented:

metadata and schema inspection
head and tail preview
row counting
file splitting (with progress bar, --force overwrite)
column statistics (numeric + string cardinality/top-values, --top-n)
format conversion (with progress bar, --force overwrite)
keyed dataset diff
compatible file merge (with progress bar, --force overwrite)
TSV auto-detection and custom delimiter support (--delimiter)
XLSX multi-sheet selection (--sheet)

Planned improvements are now centered on deeper performance tuning, richer diff workflows, and broader reporting capabilities rather than adding the core commands from scratch.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 25, 2026

0.1.9

Apr 9, 2026

0.1.8

Apr 7, 2026

0.1.7

Mar 27, 2026

0.1.6

Feb 22, 2026

0.1.5

Feb 21, 2026

0.1.3

Nov 17, 2025

0.1.2

Nov 17, 2025

0.1.0

Oct 15, 2025

0.0.3

Oct 14, 2025

0.0.2

Oct 14, 2025

0.0.1

Oct 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parq_cli-0.2.0.tar.gz (4.4 MB view details)

Uploaded Apr 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parq_cli-0.2.0-py3-none-any.whl (31.6 kB view details)

Uploaded Apr 25, 2026 Python 3

File details

Details for the file parq_cli-0.2.0.tar.gz.

File metadata

Download URL: parq_cli-0.2.0.tar.gz
Upload date: Apr 25, 2026
Size: 4.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for parq_cli-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`8f145e39f39cf1d0f1606fe77ae174fd5c0be4772e6a1d7074d313fb503a0fec`
MD5	`5b276a0c8e0cf310802cc2b31ab1e577`
BLAKE2b-256	`ea599bbb1c09a95ded9b287ae1eb902ac5a619704e5c65d8c180aff83ee6b9ac`

See more details on using hashes here.

File details

Details for the file parq_cli-0.2.0-py3-none-any.whl.

File metadata

Download URL: parq_cli-0.2.0-py3-none-any.whl
Upload date: Apr 25, 2026
Size: 31.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for parq_cli-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41929cc91ebab511304b74c46e235467ff6afccb8a3af7f1f6b1f5deb38027ec`
MD5	`dff188e9e77a1235ac93a5765c4d804a`
BLAKE2b-256	`2d1735abb82f12cd2b597b5cab0937830683788de585133b9e03663d12ab84f1`

See more details on using hashes here.

parq-cli 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

parq-cli

Overview

Installation

Quick Start

Supported Formats

Command Reference

meta

schema

head and tail

count

split

stats

convert

diff

merge

Output Modes

Large File Notes

Development

Status

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`meta`

`schema`

`head` and `tail`

`count`

`split`

`stats`

`convert`

`diff`

`merge`