imgshape v5.0.0 (Covenant) — Vision dataset governance, contract enforcement, and regression testing.
Project description
🖼️ imgshape
Vision Dataset Governance & Contract Enforcement Engine
"Define declarative contracts, run pytest-style assertions on your image datasets, and prevent training regressions in CI/CD."
⚡ What is imgshape v5.0.0 "Covenant"?
imgshape is a CI/CD-native developer tool and Python library for vision ML dataset governance. It transitions from heuristics to a strict contract-driven validation framework.
By running deterministic audits across spatial, signal, distribution, quality, and semantic dimensions, imgshape ensures that your training and validation pipelines run on verified, regression-free, and high-fidelity datasets.
🔑 Core Features
- 📄 Declarative Dataset Contracts: Write YAML schemas enforcing channel configurations, format restrictions, resolution bounds, entropy ranges, and maximum allowed corruption or duplicate rates.
- 🧪 pytest-style Dataset Testing (
imgshape test): Run automated assertions on image folders with clean, tabular console summaries and markdown/JSON output renderers. - 📐 Git-style Dataset Diffing (
imgshape diff): Compare a baseline and candidate dataset to detect statistical shifts, class imbalances, and semantic drift using DINOv2 embeddings. - 🔒 Cryptographic Audit Trails: Generates content-hashed
provenance_idmetadata and writes cross-platform.fingerprint_locklockfiles to track and seal dataset state. - 🚀 Ultra-lean & Portable: 100% Python library with a CLI. No Node.js, React UI, Streamlit, or local web servers. Fits perfectly into GitHub Actions, GitLab CI, or local terminals.
⚡ Quick Start
1. Install imgshape
# Install core package (minimal dependencies)
pip install imgshape
# Install with PyTorch support for semantic drift & GPU acceleration
pip install "imgshape[full]"
2. Define a Dataset Contract (contract.yaml)
Create a contract file to define the expected boundaries of your dataset:
schema_version: "5.0"
dataset:
expected_channels: 3
allowed_formats: [png, jpg]
resolution_min: [224, 224]
resolution_max: [1024, 1024]
quality:
blur_threshold: 1.5
corruption_max: 0.01
duplicate_max: {value: 0.05, severity: warning}
distribution:
entropy_min: 3.5
imbalance_ratio_max: 2.0
3. Run Validation via CLI
# Validate dataset against contract (exits with non-zero code on error)
imgshape validate ./my_dataset_directory contract.yaml --lock
The --lock flag automatically writes a .fingerprint_lock metadata file alongside the contract to secure your dataset version signature.
4. Perform Git-style Dataset Comparison
Compare candidate dataset against baseline fingerprint to verify drift:
# Compare candidate folder against baseline fingerprint
imgshape diff baseline_fingerprint.json ./new_candidate_dataset/ --save diff_report.md
🐍 Python API Usage
You can embed contract governance directly into your training scripts or data preparation notebooks:
from pathlib import Path
from imgshape.atlas import Atlas
from imgshape.contract import ContractLoader, ContractValidator
# 1. Profile the dataset
atlas = Atlas()
fingerprint = atlas.extract(Path("./my_dataset"))
# 2. Load the contract
contract = ContractLoader.load_yaml(Path("contract.yaml"))
# 3. Validate
validator = ContractValidator(contract)
report = validator.validate(fingerprint)
if report.passed:
print(f"✅ Dataset validated successfully! Provenance ID: {report.provenance_id}")
else:
print("❌ Dataset contract validation failed:")
for violation in report.violations:
print(f" - [{violation.severity.upper()}] {violation.clause}: {violation.message}")
🏗️ Architecture & Flow
imgshape operates as a strict quality gate between your data storage layers and model training environments.
graph TD
subgraph "Data & Spec"
A[Raw Image Dataset]
B[YAML Dataset Contract]
end
subgraph "imgshape Core (Atlas Engine)"
C[Atlas Profilers] -->|Extract Metrics| D[Dataset Fingerprint]
B -->|Parse Schema| E[Contract Validator]
D --> E
end
subgraph "Outputs & Actions"
E -->|Exit 0 / 1 / 2| F[CI/CD Build Verdict]
E -->|Lockfile| G[.fingerprint_lock]
E -->|Renderers| H[Tabular / JSON / MD Reports]
end
A --> C
📦 Dependency Groups
| Group | Command | Use Case |
|---|---|---|
| Core | pip install imgshape |
Lightweight CI/CD verification & basic profiling (~10MB) |
| Torch | pip install "imgshape[torch]" |
Adds PyTorch-based GPU acceleration & semantic feature extraction |
| Full | pip install "imgshape[full]" |
Standard installation containing PDF reports, Plotly viz, and Torch extras |
🤝 Community & Support
- Issues: Encountered a bug? Open an issue.
- Discussions: Share ideas and workflows? Join the discussion.
Built by Stifler for the ML and AI Engineering community.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imgshape-5.0.0.tar.gz.
File metadata
- Download URL: imgshape-5.0.0.tar.gz
- Upload date:
- Size: 98.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29d6cded223a7da9faab5496dda1da12d5126dc8523a595f4909ba6d1501c4dc
|
|
| MD5 |
f25f5292782b9cbcda110ed2cde78e51
|
|
| BLAKE2b-256 |
04806d34e185f672ba96b6acd335df98c25e054077016808711172539afae3af
|
File details
Details for the file imgshape-5.0.0-py3-none-any.whl.
File metadata
- Download URL: imgshape-5.0.0-py3-none-any.whl
- Upload date:
- Size: 98.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
039eb763697f271e3d299123eecfb6cbb2ae06742a3f2c441629c14f214f3248
|
|
| MD5 |
e06a4e6eb212740ab9158edf74ac5f4d
|
|
| BLAKE2b-256 |
dc3019884c1923ffc15ad3c1d72af91ad335f741f99d1a48741aca4670eb4ba8
|