Skip to main content

imgshape v5.0.0 (Covenant) — Vision dataset governance, contract enforcement, and regression testing.

Project description

🖼️ imgshape

Vision Dataset Governance & Contract Enforcement Engine

Version 5.0.0 PyPI Version Python 3.8+ Downloads


"Define declarative contracts, run pytest-style assertions on your image datasets, and prevent training regressions in CI/CD."


📖 Documentation💬 Report Bug / Discuss


⚡ What is imgshape v5.0.0 "Covenant"?

imgshape is a CI/CD-native developer tool and Python library for vision ML dataset governance. It transitions from heuristics to a strict contract-driven validation framework.

By running deterministic audits across spatial, signal, distribution, quality, and semantic dimensions, imgshape ensures that your training and validation pipelines run on verified, regression-free, and high-fidelity datasets.

🔑 Core Features

  • 📄 Declarative Dataset Contracts: Write YAML schemas enforcing channel configurations, format restrictions, resolution bounds, entropy ranges, and maximum allowed corruption or duplicate rates.
  • 🧪 pytest-style Dataset Testing (imgshape test): Run automated assertions on image folders with clean, tabular console summaries and markdown/JSON output renderers.
  • 📐 Git-style Dataset Diffing (imgshape diff): Compare a baseline and candidate dataset to detect statistical shifts, class imbalances, and semantic drift using DINOv2 embeddings.
  • 🔒 Cryptographic Audit Trails: Generates content-hashed provenance_id metadata and writes cross-platform .fingerprint_lock lockfiles to track and seal dataset state.
  • 🚀 Ultra-lean & Portable: 100% Python library with a CLI. No Node.js, React UI, Streamlit, or local web servers. Fits perfectly into GitHub Actions, GitLab CI, or local terminals.

⚡ Quick Start

1. Install imgshape

# Install core package (minimal dependencies)
pip install imgshape

# Install with PyTorch support for semantic drift & GPU acceleration
pip install "imgshape[full]"

2. Define a Dataset Contract (contract.yaml)

Create a contract file to define the expected boundaries of your dataset:

schema_version: "5.0"
dataset:
  expected_channels: 3
  allowed_formats: [png, jpg]
  resolution_min: [224, 224]
  resolution_max: [1024, 1024]
quality:
  blur_threshold: 1.5
  corruption_max: 0.01
  duplicate_max: {value: 0.05, severity: warning}
distribution:
  entropy_min: 3.5
  imbalance_ratio_max: 2.0

3. Run Validation via CLI

# Validate dataset against contract (exits with non-zero code on error)
imgshape validate ./my_dataset_directory contract.yaml --lock

The --lock flag automatically writes a .fingerprint_lock metadata file alongside the contract to secure your dataset version signature.

4. Perform Git-style Dataset Comparison

Compare candidate dataset against baseline fingerprint to verify drift:

# Compare candidate folder against baseline fingerprint
imgshape diff baseline_fingerprint.json ./new_candidate_dataset/ --save diff_report.md

🐍 Python API Usage

You can embed contract governance directly into your training scripts or data preparation notebooks:

from pathlib import Path
from imgshape.atlas import Atlas
from imgshape.contract import ContractLoader, ContractValidator

# 1. Profile the dataset
atlas = Atlas()
fingerprint = atlas.extract(Path("./my_dataset"))

# 2. Load the contract
contract = ContractLoader.load_yaml(Path("contract.yaml"))

# 3. Validate
validator = ContractValidator(contract)
report = validator.validate(fingerprint)

if report.passed:
    print(f"✅ Dataset validated successfully! Provenance ID: {report.provenance_id}")
else:
    print("❌ Dataset contract validation failed:")
    for violation in report.violations:
        print(f" - [{violation.severity.upper()}] {violation.clause}: {violation.message}")

🏗️ Architecture & Flow

imgshape operates as a strict quality gate between your data storage layers and model training environments.

graph TD
    subgraph "Data & Spec"
    A[Raw Image Dataset]
    B[YAML Dataset Contract]
    end

    subgraph "imgshape Core (Atlas Engine)"
    C[Atlas Profilers] -->|Extract Metrics| D[Dataset Fingerprint]
    B -->|Parse Schema| E[Contract Validator]
    D --> E
    end

    subgraph "Outputs & Actions"
    E -->|Exit 0 / 1 / 2| F[CI/CD Build Verdict]
    E -->|Lockfile| G[.fingerprint_lock]
    E -->|Renderers| H[Tabular / JSON / MD Reports]
    end

    A --> C

📦 Dependency Groups

Group Command Use Case
Core pip install imgshape Lightweight CI/CD verification & basic profiling (~10MB)
Torch pip install "imgshape[torch]" Adds PyTorch-based GPU acceleration & semantic feature extraction
Full pip install "imgshape[full]" Standard installation containing PDF reports, Plotly viz, and Torch extras

🤝 Community & Support

Built by Stifler for the ML and AI Engineering community.

Star on GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imgshape-5.0.0.tar.gz (98.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imgshape-5.0.0-py3-none-any.whl (98.2 kB view details)

Uploaded Python 3

File details

Details for the file imgshape-5.0.0.tar.gz.

File metadata

  • Download URL: imgshape-5.0.0.tar.gz
  • Upload date:
  • Size: 98.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for imgshape-5.0.0.tar.gz
Algorithm Hash digest
SHA256 29d6cded223a7da9faab5496dda1da12d5126dc8523a595f4909ba6d1501c4dc
MD5 f25f5292782b9cbcda110ed2cde78e51
BLAKE2b-256 04806d34e185f672ba96b6acd335df98c25e054077016808711172539afae3af

See more details on using hashes here.

File details

Details for the file imgshape-5.0.0-py3-none-any.whl.

File metadata

  • Download URL: imgshape-5.0.0-py3-none-any.whl
  • Upload date:
  • Size: 98.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for imgshape-5.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 039eb763697f271e3d299123eecfb6cbb2ae06742a3f2c441629c14f214f3248
MD5 e06a4e6eb212740ab9158edf74ac5f4d
BLAKE2b-256 dc3019884c1923ffc15ad3c1d72af91ad335f741f99d1a48741aca4670eb4ba8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page