Human-friendly schema diff and contract drift detection for CSV, JSON, JSONL, Parquet, OpenAPI, Avro, and protobuf.
Project description
SchemaGlow
Human-friendly schema diff for CSV, JSON, JSONL, Parquet, OpenAPI, Avro, and protobuf.
SchemaGlow compares data files, schema artifacts, directory trees, and saved contract snapshots. It tells you what changed, whether it is safe, and what might break. It is built for pull request review, CI checks, repository-wide drift scans, and baseline contract validation when raw git diffs are not enough.
Why
Most nearby tools validate data contracts, inspect file structure, or diff technical schemas in a format-specific way. SchemaGlow focuses on a narrower workflow:
- compare two file versions quickly
- explain changes in plain language
- classify impact as
SAFE,WARNING, orBREAKING - export machine-readable and review-friendly reports
Features
- Compare
CSV,JSON,JSONL,Parquet,OpenAPI,Avro, andprotobufsources with one CLI. - Infer normalized schema snapshots from both raw data files and schema-definition files.
- Classify compatibility changes as
SAFE,WARNING, orBREAKING. - Export diff output as terminal text, JSON, Markdown, or HTML.
- Save schema snapshots and compare them later without re-reading source files.
- Scan two directory trees recursively and aggregate drift into one report.
- Capture baseline contract files and check candidate trees against committed baselines.
- Detect optional nested expansions, removals, type changes, nullability changes, sample-shape ambiguity, and column-order-only changes.
- Support ignore rules, strict numeric widening, and rename heuristics with sample overlap.
Installation
pip install schemaglow
For local development:
python3 -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"
CLI
schemaglow diff
Compare two files directly.
schemaglow diff old.parquet new.parquet
schemaglow diff baseline.jsonl candidate.jsonl --format json
schemaglow diff old.openapi.yaml new.openapi.yaml
schemaglow diff old.avsc new.avsc
schemaglow diff old.proto new.proto --report html --report-path proto-report.html
schemaglow diff old.csv new.csv --report markdown --report-path schema-report.md
schemaglow diff old.csv new.csv --ignore-fields '(^_loaded_at$|^metadata\.)'
schemaglow diff old.csv new.csv --strict --rename-heuristics
Example text output:
SchemaGlow Report
BREAKING
old: old.csv
new: new.csv
counts: SAFE=1 WARNING=0 BREAKING=1
BREAKING
- removed field: order_total
SAFE
+ column order changed only
schemaglow inspect
Infer a snapshot from one file and print its normalized field model.
schemaglow inspect data.json
schemaglow inspect data.parquet --format json
schemaglow inspect openapi.yaml --format json
schemaglow inspect schema.proto
schemaglow snapshot
Persist an inferred snapshot to JSON for later comparison.
schemaglow snapshot data.jsonl -o snapshots/baseline.schema.json
schemaglow snapshot schema.avsc -o snapshots/avro.schema.json
schemaglow compare
Compare two saved schema snapshots.
schemaglow compare old.schema.json new.schema.json
schemaglow compare old.schema.json new.schema.json --format json
schemaglow scan
Compare two directory trees recursively and aggregate the results.
schemaglow scan datasets/baseline datasets/candidate
schemaglow scan specs/old specs/new --format json
schemaglow scan repo-old repo-new --pattern '*.proto' --report markdown --report-path scan.md
schemaglow baseline capture
Capture a repository-local contract baseline made of saved snapshots.
schemaglow baseline capture data/ -o .schemaglow-baseline
schemaglow baseline capture specs/ -o contracts/api --pattern '*.yaml'
schemaglow baseline check
Compare a candidate tree against committed baseline contract files.
schemaglow baseline check .schemaglow-baseline data/
schemaglow baseline check contracts/api specs/ --format json
Supported Inputs
| Format | Typical suffixes | Notes |
|---|---|---|
| CSV | .csv |
Header-driven field discovery with scalar inference |
| JSON | .json |
Raw object or array data; OpenAPI JSON is auto-detected |
| JSONL | .jsonl |
One JSON object per line |
| Parquet | .parquet |
Schema extracted with PyArrow |
| OpenAPI | .yaml, .yml, .json |
Local refs, component schemas, request/response schemas |
| Avro | .avsc |
Records, arrays, maps, enums, unions |
| Protobuf | .proto |
Messages, enums, repeated fields, and maps |
Compatibility Rules
SAFE
- new nullable or optional top-level field
- column order changed only
- numeric widening from
integertonumberunless--strictis enabled - no schema change
WARNING
- new required top-level field
- nested object shape expanded
- required to nullable change
- ambiguous or mixed-type widening
- sample shape changed while remaining string-typed
- likely rename detected with
--rename-heuristics
BREAKING
- field removed
- nullable to required change
- incompatible type change such as
string -> integer
Architecture
The package uses a small pipeline that mirrors the product brief.
src/schemaglow/
├── cli.py # Typer command surface
├── service.py # File and snapshot orchestration
├── infer.py # Format detection and schema inference
├── schema_sources.py # OpenAPI, Avro, and protobuf parsers
├── diffing.py # Compatibility rules and event generation
├── renderers.py # Text, JSON, Markdown, and HTML output
└── models.py # Pydantic models for snapshots and reports
Processing flow:
- Detect the input format from suffix and schema-document heuristics.
- Infer a normalized field map with type, nullability, order, and sample hints.
- Compare old and new field sets against compatibility rules.
- Aggregate file-level results for scans and baseline checks when needed.
- Render the result for humans or CI consumers.
Tools Used
| Tool | Purpose |
|---|---|
Python 3.11+ |
Runtime and packaging baseline |
Typer |
CLI commands and help output |
Rich |
Terminal rendering |
Pydantic |
Snapshot and report models |
PyArrow |
Parquet schema reading and test fixture creation |
PyYAML |
OpenAPI YAML parsing |
Jinja2 |
HTML report templating |
pytest + pytest-cov |
Unit and integration tests with coverage |
mypy |
Strict type checking |
ruff |
Linting and formatting |
pip-audit |
Dependency vulnerability checks |
Testing and Verification
Local verification commands:
ruff format --check .
ruff check .
mypy src
pytest
pip-audit
Manual end-to-end commands using committed sample files are documented in TESTING.md.
The automated test suite covers:
- CSV inference and numeric widening behavior
- JSON and JSONL nested shape, nullability, and sample-shape changes
- OpenAPI, Avro, and protobuf schema parsing
- nested diff collapsing and rename heuristics
- snapshot and baseline round-trips
- CLI integration for
inspect,snapshot,compare,diff,scan, andbaseline - Parquet and directory report generation
Repository Standards
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file schemaglow-1.0.0.tar.gz.
File metadata
- Download URL: schemaglow-1.0.0.tar.gz
- Upload date:
- Size: 64.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdf10ed766d89ca463e8194a4e72479136f2569471b86a412da8f4aca070ebdd
|
|
| MD5 |
40f70d9dde9047e569232c255bc71121
|
|
| BLAKE2b-256 |
bd498010239a2efa6f8dd3632238fc32e0faf0b6c6c46da3a66d44809e5f590e
|
File details
Details for the file schemaglow-1.0.0-py3-none-any.whl.
File metadata
- Download URL: schemaglow-1.0.0-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61be6348f3b699a05a561f57817c95a91d70710fa6ff2895afa6d39b9fbd89cc
|
|
| MD5 |
7095d6d2c8f4c4a5971c8c6c4a576e5f
|
|
| BLAKE2b-256 |
42ae4391354e8cc4e66bdea2c107b0cfa178671334ee34fdb06fcfd3d145b360
|