Quick data profiling CLI for parquet and CSV files
Project description
data-glance
Fast data profiling CLI for parquet and CSV files. Powered by ydata-profiling and Polars.
Installation
Install from PyPI:
# Run with uvx (cached)
uvx data-glance profile data.parquet
# Install globally
uv tool install data-glance
# Install with pip
pip install data-glance
Or run directly from GitHub:
# Run from GitHub (always latest)
uvx --from git+https://github.com/bswrundquist/data-glance data-glance profile data.parquet
Quick Start
# Profile a file
data-glance profile data.parquet
# Quick preview
data-glance head data.csv
# Check data quality
data-glance diagnose data.parquet
# View schema
data-glance schema data.parquet
Commands
| Command | Description |
|---|---|
profile |
Generate HTML profile report |
diagnose |
Check data quality issues |
head |
Preview first N rows |
tail |
Preview last N rows |
schema |
Display column types |
stats |
Quick statistics |
count |
Count rows (fast) |
columns |
List column names |
unique |
Show unique values |
filter |
Filter data by expression |
sample |
Extract random sample |
convert |
Convert between formats |
compare |
Compare two files |
validate |
Validate data rules |
info |
File metadata |
generate |
Create test data |
Profile Command
Basic Usage
data-glance profile data.parquet
data-glance profile data.csv --preset quick
data-glance profile data.parquet --preset full
data-glance profile huge.parquet --sample 10000
Column Filtering
data-glance profile data.csv --include "user_*,order_*"
data-glance profile data.csv --exclude "*_id,*_hash"
Null Handling
data-glance profile data.csv --nulls drop-cols
data-glance profile data.csv --drop-null-threshold 0.5
data-glance profile data.csv --drop-constant
Output Options
data-glance profile data.csv -o report.html
data-glance profile data.csv --json report.json
data-glance profile data.csv --no-browser
data-glance profile data.csv --dry-run
CSV Options
data-glance profile data.tsv --delimiter tab
data-glance profile data.csv --encoding latin-1
data-glance profile messy.csv --ignore-errors
Data Inspection
head / tail - Preview Data
data-glance head data.parquet --rows 20
data-glance tail data.csv --rows 10
schema - View Structure
data-glance schema data.parquet
stats - Quick Statistics
data-glance stats data.parquet
count - Row Count
data-glance count data.parquet # Single file
data-glance count *.csv # Multiple files
data-glance count *.parquet --total # Just the number
columns - List Columns
data-glance columns data.parquet
data-glance columns data.csv --one # One per line (for piping)
data-glance columns data.csv --types # With data types
data-glance columns data.csv --one | grep user # Filter columns
unique - Value Distribution
data-glance unique data.csv status
data-glance unique data.parquet category --counts --sort
data-glance unique data.csv user_id --limit 50
info - File Metadata
data-glance info data.parquet
Shows file size, modification time, and for parquet: row count, columns, row groups.
Data Operations
filter - Query Data
# Filter by condition
data-glance filter data.csv "col('status') == 'active'"
data-glance filter data.parquet "col('age') > 30" -o filtered.parquet
data-glance filter data.csv "col('name').str.contains('test')" --limit 100
# Expression syntax (Polars)
col('column') == 'value'
col('column') > 100
col('column').is_in(['a', 'b'])
col('column').is_null()
col('column').str.contains('pattern')
sample - Extract Sample
data-glance sample data.parquet sample.parquet -n 1000
data-glance sample big.csv small.csv --fraction 0.1
data-glance sample data.parquet sample.csv # Convert while sampling
convert - Format Conversion
data-glance convert data.csv data.parquet
data-glance convert data.parquet data.csv
data-glance convert data.csv data.parquet --compression zstd
Data Quality
diagnose - Quality Check
data-glance diagnose data.csv
Shows: schema, null percentages, quality issues, suggested fixes.
compare - Diff Files
data-glance compare data_v1.parquet data_v2.parquet
Shows: row/column differences, schema changes, null changes.
validate - Check Rules
# Check for nulls
data-glance validate data.csv --no-nulls "id,email"
# Check uniqueness
data-glance validate data.parquet --unique "id"
# Check row count
data-glance validate data.csv --min-rows 1000
# Check null percentage
data-glance validate data.csv --max-null-pct 0.1
# Check required columns
data-glance validate data.csv --required-cols "id,name,email"
# Combine rules
data-glance validate data.parquet \
--unique "id" \
--no-nulls "id,email" \
--min-rows 1000
Returns exit code 1 if validation fails (useful in CI/CD).
Global Options
data-glance -q profile data.csv # Quiet mode
data-glance -v profile data.csv # Verbose mode
Test Data
data-glance generate test.parquet --rows 5000
data-glance generate test.csv --edge-cases --nulls 0.1
Presets
| Preset | Speed | Detail | Use Case |
|---|---|---|---|
quick |
Fast | Minimal | Large files, quick checks |
default |
Medium | Standard | Most use cases |
full |
Slow | Detailed | Deep analysis |
Tips
- Use
--preset quickor--samplefor large files - Use
diagnosebeforeprofileto understand data quality - Use
--dry-runto preview what will be profiled - Use
validatein CI/CD pipelines - Use
count --totalfor scripting - Use
columns --oneto pipe to other tools - Use
filterto extract subsets before profiling
Development
# Clone and install
git clone https://github.com/bswrundquist/data-glance
cd data-glance
make install-dev
# Run tests
make test
# Lint and format
make lint
make format
# Build
make build
# Release
make release-patch # 0.1.0 -> 0.1.1
make release-minor # 0.1.0 -> 0.2.0
make release-major # 0.1.0 -> 1.0.0
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file data_glance-0.3.1.tar.gz.
File metadata
- Download URL: data_glance-0.3.1.tar.gz
- Upload date:
- Size: 15.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26669937211cbcdba5d938ede7cbe31b2d2078c3e1affe3404117e43f90ec557
|
|
| MD5 |
c40701b225d2fdb490958ea3ae2ed91d
|
|
| BLAKE2b-256 |
4119a088e4bf9d0b217770a2c7bf56f44dfae90a2f5d64510d132ada28b1de5d
|
File details
Details for the file data_glance-0.3.1-py3-none-any.whl.
File metadata
- Download URL: data_glance-0.3.1-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50a873f64a8c4dbc8b7b7e7a46668c09afb7a2505b49e9fec4487c4a885b7811
|
|
| MD5 |
8aa0602b7c9183d64a5b45d016b70a78
|
|
| BLAKE2b-256 |
2f84dcf35b44f58572998c1dcc8bb2ab0670113782c4774410805174e9b699ab
|