Skip to main content

Quick data profiling CLI for parquet and CSV files

Project description

data-glance

Fast data profiling CLI for parquet and CSV files. Powered by ydata-profiling and Polars.

Installation

Install from PyPI:

# Run with uvx (cached)
uvx data-glance profile data.parquet

# Install globally
uv tool install data-glance

# Install with pip
pip install data-glance

Or run directly from GitHub:

# Run from GitHub (always latest)
uvx --from git+https://github.com/bswrundquist/data-glance data-glance profile data.parquet

Quick Start

# Profile a file
data-glance profile data.parquet

# Quick preview
data-glance head data.csv

# Check data quality
data-glance diagnose data.parquet

# View schema
data-glance schema data.parquet

Commands

Command Description
profile Generate HTML profile report
diagnose Check data quality issues
head Preview first N rows
tail Preview last N rows
schema Display column types
stats Quick statistics
count Count rows (fast)
columns List column names
unique Show unique values
filter Filter data by expression
sample Extract random sample
convert Convert between formats
compare Compare two files
validate Validate data rules
info File metadata
generate Create test data

Profile Command

Basic Usage

data-glance profile data.parquet
data-glance profile data.csv --preset quick
data-glance profile data.parquet --preset full
data-glance profile huge.parquet --sample 10000

Column Filtering

data-glance profile data.csv --include "user_*,order_*"
data-glance profile data.csv --exclude "*_id,*_hash"

Null Handling

data-glance profile data.csv --nulls drop-cols
data-glance profile data.csv --drop-null-threshold 0.5
data-glance profile data.csv --drop-constant

Output Options

data-glance profile data.csv -o report.html
data-glance profile data.csv --json report.json
data-glance profile data.csv --no-browser
data-glance profile data.csv --dry-run

CSV Options

data-glance profile data.tsv --delimiter tab
data-glance profile data.csv --encoding latin-1
data-glance profile messy.csv --ignore-errors

Data Inspection

head / tail - Preview Data

data-glance head data.parquet --rows 20
data-glance tail data.csv --rows 10

schema - View Structure

data-glance schema data.parquet

stats - Quick Statistics

data-glance stats data.parquet

count - Row Count

data-glance count data.parquet          # Single file
data-glance count *.csv                 # Multiple files
data-glance count *.parquet --total     # Just the number

columns - List Columns

data-glance columns data.parquet
data-glance columns data.csv --one       # One per line (for piping)
data-glance columns data.csv --types     # With data types
data-glance columns data.csv --one | grep user  # Filter columns

unique - Value Distribution

data-glance unique data.csv status
data-glance unique data.parquet category --counts --sort
data-glance unique data.csv user_id --limit 50

info - File Metadata

data-glance info data.parquet

Shows file size, modification time, and for parquet: row count, columns, row groups.

Data Operations

filter - Query Data

# Filter by condition
data-glance filter data.csv "col('status') == 'active'"
data-glance filter data.parquet "col('age') > 30" -o filtered.parquet
data-glance filter data.csv "col('name').str.contains('test')" --limit 100

# Expression syntax (Polars)
col('column') == 'value'
col('column') > 100
col('column').is_in(['a', 'b'])
col('column').is_null()
col('column').str.contains('pattern')

sample - Extract Sample

data-glance sample data.parquet sample.parquet -n 1000
data-glance sample big.csv small.csv --fraction 0.1
data-glance sample data.parquet sample.csv  # Convert while sampling

convert - Format Conversion

data-glance convert data.csv data.parquet
data-glance convert data.parquet data.csv
data-glance convert data.csv data.parquet --compression zstd

Data Quality

diagnose - Quality Check

data-glance diagnose data.csv

Shows: schema, null percentages, quality issues, suggested fixes.

compare - Diff Files

data-glance compare data_v1.parquet data_v2.parquet

Shows: row/column differences, schema changes, null changes.

validate - Check Rules

# Check for nulls
data-glance validate data.csv --no-nulls "id,email"

# Check uniqueness
data-glance validate data.parquet --unique "id"

# Check row count
data-glance validate data.csv --min-rows 1000

# Check null percentage
data-glance validate data.csv --max-null-pct 0.1

# Check required columns
data-glance validate data.csv --required-cols "id,name,email"

# Combine rules
data-glance validate data.parquet \
    --unique "id" \
    --no-nulls "id,email" \
    --min-rows 1000

Returns exit code 1 if validation fails (useful in CI/CD).

Global Options

data-glance -q profile data.csv    # Quiet mode
data-glance -v profile data.csv    # Verbose mode

Test Data

data-glance generate test.parquet --rows 5000
data-glance generate test.csv --edge-cases --nulls 0.1

Presets

Preset Speed Detail Use Case
quick Fast Minimal Large files, quick checks
default Medium Standard Most use cases
full Slow Detailed Deep analysis

Tips

  • Use --preset quick or --sample for large files
  • Use diagnose before profile to understand data quality
  • Use --dry-run to preview what will be profiled
  • Use validate in CI/CD pipelines
  • Use count --total for scripting
  • Use columns --one to pipe to other tools
  • Use filter to extract subsets before profiling

Development

# Clone and install
git clone https://github.com/bswrundquist/data-glance
cd data-glance
make install-dev

# Run tests
make test

# Lint and format
make lint
make format

# Build
make build

# Release
make release-patch  # 0.1.0 -> 0.1.1
make release-minor  # 0.1.0 -> 0.2.0
make release-major  # 0.1.0 -> 1.0.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_glance-0.3.3.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_glance-0.3.3-py3-none-any.whl (18.7 kB view details)

Uploaded Python 3

File details

Details for the file data_glance-0.3.3.tar.gz.

File metadata

  • Download URL: data_glance-0.3.3.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_glance-0.3.3.tar.gz
Algorithm Hash digest
SHA256 f7c97a9a7e3c64b5b37b7a41e09703ab2333853dec2ec8024c8df70ca08cdfdc
MD5 986c1e10f3b6cbb3df1854ca907a7776
BLAKE2b-256 bf01c265875b4d909022e4a25f258b9fdade2497c2ce542bf88b1bee8e3e99eb

See more details on using hashes here.

File details

Details for the file data_glance-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: data_glance-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 18.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.4 {"installer":{"name":"uv","version":"0.10.4","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for data_glance-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6e12a8407ef44cbda61f47c0f5a794ff9927a35903c8a23e1cce992da201c32e
MD5 f531df25e3540517a6b7221b2125a086
BLAKE2b-256 21c183c33fc91a6dbe74db60b1f179da09d0098e56d1c0e2f5d79d4dc089d25b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page