Skip to main content

Quick data profiling CLI for parquet and CSV files

Project description

data-glance

Fast data profiling CLI for parquet and CSV files. Powered by ydata-profiling and Polars.

Installation

Install from PyPI:

# Run with uvx (cached)
uvx data-glance profile data.parquet

# Install globally
uv tool install data-glance

# Install with pip
pip install data-glance

Or run directly from GitHub:

# Run from GitHub (always latest)
uvx --from git+https://github.com/bswrundquist/data-glance data-glance profile data.parquet

Quick Start

# Profile a file
data-glance profile data.parquet

# Quick preview
data-glance head data.csv

# Check data quality
data-glance diagnose data.parquet

# View schema
data-glance schema data.parquet

Commands

Command Description
profile Generate HTML profile report
diagnose Check data quality issues
head Preview first N rows
tail Preview last N rows
schema Display column types
stats Quick statistics
count Count rows (fast)
columns List column names
unique Show unique values
filter Filter data by expression
sample Extract random sample
convert Convert between formats
compare Compare two files
validate Validate data rules
info File metadata
generate Create test data

Profile Command

Basic Usage

data-glance profile data.parquet
data-glance profile data.csv --preset quick
data-glance profile data.parquet --preset full
data-glance profile huge.parquet --sample 10000

Column Filtering

data-glance profile data.csv --include "user_*,order_*"
data-glance profile data.csv --exclude "*_id,*_hash"

Null Handling

data-glance profile data.csv --nulls drop-cols
data-glance profile data.csv --drop-null-threshold 0.5
data-glance profile data.csv --drop-constant

Output Options

data-glance profile data.csv -o report.html
data-glance profile data.csv --json report.json
data-glance profile data.csv --no-browser
data-glance profile data.csv --dry-run

CSV Options

data-glance profile data.tsv --delimiter tab
data-glance profile data.csv --encoding latin-1
data-glance profile messy.csv --ignore-errors

Data Inspection

head / tail - Preview Data

data-glance head data.parquet --rows 20
data-glance tail data.csv --rows 10

schema - View Structure

data-glance schema data.parquet

stats - Quick Statistics

data-glance stats data.parquet

count - Row Count

data-glance count data.parquet          # Single file
data-glance count *.csv                 # Multiple files
data-glance count *.parquet --total     # Just the number

columns - List Columns

data-glance columns data.parquet
data-glance columns data.csv --one       # One per line (for piping)
data-glance columns data.csv --types     # With data types
data-glance columns data.csv --one | grep user  # Filter columns

unique - Value Distribution

data-glance unique data.csv status
data-glance unique data.parquet category --counts --sort
data-glance unique data.csv user_id --limit 50

info - File Metadata

data-glance info data.parquet

Shows file size, modification time, and for parquet: row count, columns, row groups.

Data Operations

filter - Query Data

# Filter by condition
data-glance filter data.csv "col('status') == 'active'"
data-glance filter data.parquet "col('age') > 30" -o filtered.parquet
data-glance filter data.csv "col('name').str.contains('test')" --limit 100

# Expression syntax (Polars)
col('column') == 'value'
col('column') > 100
col('column').is_in(['a', 'b'])
col('column').is_null()
col('column').str.contains('pattern')

sample - Extract Sample

data-glance sample data.parquet sample.parquet -n 1000
data-glance sample big.csv small.csv --fraction 0.1
data-glance sample data.parquet sample.csv  # Convert while sampling

convert - Format Conversion

data-glance convert data.csv data.parquet
data-glance convert data.parquet data.csv
data-glance convert data.csv data.parquet --compression zstd

Data Quality

diagnose - Quality Check

data-glance diagnose data.csv

Shows: schema, null percentages, quality issues, suggested fixes.

compare - Diff Files

data-glance compare data_v1.parquet data_v2.parquet

Shows: row/column differences, schema changes, null changes.

validate - Check Rules

# Check for nulls
data-glance validate data.csv --no-nulls "id,email"

# Check uniqueness
data-glance validate data.parquet --unique "id"

# Check row count
data-glance validate data.csv --min-rows 1000

# Check null percentage
data-glance validate data.csv --max-null-pct 0.1

# Check required columns
data-glance validate data.csv --required-cols "id,name,email"

# Combine rules
data-glance validate data.parquet \
    --unique "id" \
    --no-nulls "id,email" \
    --min-rows 1000

Returns exit code 1 if validation fails (useful in CI/CD).

Global Options

data-glance -q profile data.csv    # Quiet mode
data-glance -v profile data.csv    # Verbose mode

Test Data

data-glance generate test.parquet --rows 5000
data-glance generate test.csv --edge-cases --nulls 0.1

Presets

Preset Speed Detail Use Case
quick Fast Minimal Large files, quick checks
default Medium Standard Most use cases
full Slow Detailed Deep analysis

Tips

  • Use --preset quick or --sample for large files
  • Use diagnose before profile to understand data quality
  • Use --dry-run to preview what will be profiled
  • Use validate in CI/CD pipelines
  • Use count --total for scripting
  • Use columns --one to pipe to other tools
  • Use filter to extract subsets before profiling

Development

# Clone and install
git clone https://github.com/bswrundquist/data-glance
cd data-glance
make install-dev

# Run tests
make test

# Lint and format
make lint
make format

# Build
make build

# Release
make release-patch  # 0.1.0 -> 0.1.1
make release-minor  # 0.1.0 -> 0.2.0
make release-major  # 0.1.0 -> 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_glance-0.3.0.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_glance-0.3.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file data_glance-0.3.0.tar.gz.

File metadata

  • Download URL: data_glance-0.3.0.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_glance-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3d827fbf4f056a65b0f76cbd4618fde55bb4f352188d80ab18b9b8a760a3ca2b
MD5 e723d22e86d073524ba5451078410282
BLAKE2b-256 37e8a183658c480e718aff9ce4f51259d357f9080b5f62cb89ec76e8bdbd7d23

See more details on using hashes here.

File details

Details for the file data_glance-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: data_glance-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_glance-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0d0e5bf068851fc3c57f9af8ef69eb2766f83f44a65620bf85bbdb872f17e872
MD5 96e636b163f48cdc8ce8ec87296d0159
BLAKE2b-256 15c0fd3a704413ece71ec12b4978ae708da1891586808d2c62da3419767ee220

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page