imgshape v4.2.0 (Bento Intelligence) — Dataset intelligence layer: deterministic fingerprinting, semantic drift, and bento-box UI.
Project description
🖼️ imgshape
The Data-Centric AI Toolkit for Vision Engineers
"Automatically analyze any image dataset and get model-ready preprocessing recommendations in one command."
🚀 Live Demo (Web) • 📖 Documentation • 💬 Report Bug / Discuss
✨ What's New in v4.2 "Bento Intelligence"
- 🍱 Bento Grid UI: A complete UX overhaul using a modular 12-column grid for high-density dataset insights.
- 🌊 Semantic Drift 2.0: detect dataset shifts using DINOv2 vision transformer embeddings.
- 🚀 Atlas Bento Engine: 40% faster fingerprinting via vectorized IO and multi-stage caching.
- 🧩 Domain Profiles: One-click configurations for Medical, Satellite, and OCR datasets.
⚡ 30-Second Start
Don't guess your dataset's health. Audit it immediately with the Atlas engine.
pip install imgshape
from imgshape import Atlas
# 1. Initialize the Atlas Orchestrator
atlas = Atlas()
# 2. Extract deterministic fingerprint
result = atlas.extract_fingerprint("./my_dataset")
# 3. View the verdict
print(result.summary())
System Output:
{
"fingerprint_id": "fp_8a7d9f2",
"total_images": 4502,
"corrupt_files": 12,
"metrics": {
"avg_resolution": "1024x768",
"diversity_score": 0.89,
"channel_consistency": "FAIL"
},
"issues": ["Found 14 grayscale images in RGB dataset"]
}
🔍 The Visual Dashboard (Atlas UI)
Experience imgshape's capabilities visually. The dashboard provides a real-time interface for dataset fingerprinting, augmentation previews, and pipeline configuration using the new Bento Grid layout.
Dashboard v4.2.0 showing Bento Grid layout and semantic drift detection.
🚀 Why imgshape?
Most vision models fail because of garbage data—corrupt files, mixed channels (RGBA vs RGB), or weird aspect ratios. imgshape catches these before you train using a deterministic rule engine.
| Module | Technical Function |
|---|---|
| 🔍 Instant Audit | Multi-threaded + GPU-accelerated scan for entropy, blur, and variance using PyTorch. |
| 🧠 Decision Engine | Heuristic-based suggestion engine with Provenance IDs and Reproducibility Hashes. |
| 📊 Semantic Drift | NEW: DINOv2-powered drift analysis between dataset versions. |
| 🍱 Bento Grid UI | NEW: High-density Modular Dashboard for interactive exploration. |
| 🛠️ Pipeline Export | Generates serialization-safe code for PyTorch, TensorFlow, and Albumentations. |
📦 Installation Matrix
Choose your deployment flavor.
| Command | Use Case | Size |
|---|---|---|
pip install imgshape |
Core / CI/CD | ~12MB |
pip install "imgshape[full]" |
Research / Power User | ~45MB |
pip install "imgshape[ui]" |
Interactive / Dashboard | ~30MB |
💡 Practical Use Cases
1. The "Sanity Check" (CI/CD Integration)
Block bad data from entering your training bucket. Ideal for GitHub Actions or Jenkins.
# Returns exit code 1 if corrupt files or schema violations are found
imgshape --check ./new_batch_v2 --strict-schema
2. The "Pipeline Builder"
Don't guess augmentation parameters. Let the entropy statistics decide.
# analyze -> recommend -> export PyTorch snippet
imgshape --path ./train_data --analyze --recommend --out transforms.py
3. The "Visual Explorer"
Verify RandomCrop or ColorJitter intensity manually before training.
# Launches local studio with auto-reload
imgshape --web --reload
🏗️ Architecture & Internal Mechanics
imgshape (Aurora Engine) operates on a Fingerprint-Analyze-Decide loop, acting as a middleware between raw storage and compute.
graph TD
subgraph "Data Layer"
A[Raw Images]
end
subgraph "imgshape Core (Atlas Bento)"
B[Fingerprint Extractor] -->|Hash & Meta| C{Decision Engine}
C -->|Rules v4.2| D[Recommendation]
end
subgraph "Integration Layer"
D --> E[PyTorch/TF Code]
D --> F[JSON Artifacts]
D --> G[HTML/PDF Reports]
end
A --> B
Core Components
- Atlas Bento Orchestrator: The central intent-driven API that manages the lifecycle of an analysis session.
- Fingerprint Extractor: A stateless module that computes immutable signatures for datasets (distributions, channel counts, hashes).
- Decision Engine: A rule-based system that maps dataset signatures + User Intent (e.g., "Speed" vs "Accuracy") to concrete preprocessing steps.
🤝 Community & Support
- Issues: Found a bug? Open an issue.
- Discussions: Feature requests? Join the discussion.
Built by Stifler for the AI Engineering community.
Star on GitHub ⭐ — it helps more people find clean data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imgshape-4.2.0.tar.gz.
File metadata
- Download URL: imgshape-4.2.0.tar.gz
- Upload date:
- Size: 77.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf4b88b40c7a622d38de7503ef2001f4a203eb5cda71fda8c769f9c20f8a0345
|
|
| MD5 |
7a97ed5ccbcf2ea22f6dee57439a5e0c
|
|
| BLAKE2b-256 |
83ef68fee62729a0e95d38b913db2451481809bfa89815adf75f64a4f3527d71
|
File details
Details for the file imgshape-4.2.0-py3-none-any.whl.
File metadata
- Download URL: imgshape-4.2.0-py3-none-any.whl
- Upload date:
- Size: 77.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43042aa445ae57a360d045da3a1b6949427cae543cea04b3a0a0d90f145da707
|
|
| MD5 |
8c171539f7605a861ff9c234f715e927
|
|
| BLAKE2b-256 |
4c4d3c98e59d8f5d00d52d114dc343523723ff820235600301b32458e19dc393
|