Skip to main content

imgshape v4.1.0 (Atlas) — Dataset intelligence layer: deterministic fingerprinting and decision-making for ML pipelines.

Project description

🖼️ imgshape

The Data-Centric AI Toolkit for Vision Engineers

PyPI Version Python Version Downloads Streamlit


"Automatically analyze any image dataset and get model-ready preprocessing recommendations in one command."


🚀 Live Demo (Web)📖 Documentation💬 Report Bug / Discuss


⚡ 30-Second Start

Don't guess your dataset's health. Audit it immediately with the Atlas engine.

pip install imgshape

from imgshape import Atlas

# 1. Initialize the Atlas Orchestrator
atlas = Atlas()

# 2. Extract deterministic fingerprint
result = atlas.extract_fingerprint("./my_dataset")

# 3. View the verdict
print(result.summary())

System Output:

{
  "fingerprint_id": "fp_8a7d9f2",
  "total_images": 4502,
  "corrupt_files": 12,
  "metrics": {
    "avg_resolution": "1024x768",
    "diversity_score": 0.89,
    "channel_consistency": "FAIL"
  },
  "issues": ["Found 14 grayscale images in RGB dataset"]
}

🔍 The Visual Dashboard (Atlas UI)

Experience imgshape's capabilities visually. The dashboard provides a real-time interface for dataset fingerprinting, augmentation previews, and pipeline configuration.

imgshape Dashboard

Dashboard v4.1.0 showing GPU acceleration status and drift detection.


🚀 Why imgshape?

Most vision models fail because of garbage data—corrupt files, mixed channels (RGBA vs RGB), or weird aspect ratios. imgshape catches these before you train using a deterministic rule engine.

Module Technical Function
🔍 Instant Audit Multi-threaded + GPU-accelerated scan for entropy, blur, and variance using PyTorch.
🧠 Decision Engine Heuristic-based suggestion engine with Provenance IDs and Reproducibility Hashes.
📊 Comparison Layer NEW: Drift Analysis and Similarity Indexing between dataset versions.
🛠️ Pipeline Export Generates serialization-safe code for PyTorch, TensorFlow, and Albumentations.
🎨 Visual Studio Local Web Dashboard for interactive augmentation testing and hypothesis verification.

📦 Installation Matrix

Choose your deployment flavor.

Command Use Case Size
pip install imgshape Core / CI/CD ~12MB
pip install "imgshape[full]" Research / Power User ~45MB
pip install "imgshape[ui]" Interactive / Dashboard ~30MB

💡 Practical Use Cases

1. The "Sanity Check" (CI/CD Integration)

Block bad data from entering your training bucket. Ideal for GitHub Actions or Jenkins.

# Returns exit code 1 if corrupt files or schema violations are found
imgshape --check ./new_batch_v2 --strict-schema

2. The "Pipeline Builder"

Don't guess augmentation parameters. Let the entropy statistics decide.

# analyze -> recommend -> export PyTorch snippet
imgshape --path ./train_data --analyze --recommend --out transforms.py

3. The "Visual Explorer"

Verify RandomCrop or ColorJitter intensity manually before training.

# Launches local studio with auto-reload
imgshape --web --reload

🏗️ Architecture & Internal Mechanics

imgshape (Aurora Engine) operates on a Fingerprint-Analyze-Decide loop, acting as a middleware between raw storage and compute.

graph TD
    subgraph "Data Layer"
    A[Raw Images]
    end

    subgraph "imgshape Core (Atlas)"
    B[Fingerprint Extractor] -->|Hash & Meta| C{Decision Engine}
    C -->|Rules v4.0| D[Recommendation]
    end

    subgraph "Integration Layer"
    D --> E[PyTorch/TF Code]
    D --> F[JSON Artifacts]
    D --> G[HTML/PDF Reports]
    end

    A --> B

Core Components

  • Atlas Orchestrator: The central intent-driven API that manages the lifecycle of an analysis session.
  • Fingerprint Extractor: A stateless module that computes immutable signatures for datasets (distributions, channel counts, hashes).
  • Decision Engine: A rule-based system that maps dataset signatures + User Intent (e.g., "Speed" vs "Accuracy") to concrete preprocessing steps.

🤝 Community & Support

Built by Stifler for the AI Engineering community.

Star on GitHub — it helps more people find clean data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imgshape-4.1.0.tar.gz (70.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imgshape-4.1.0-py3-none-any.whl (71.9 kB view details)

Uploaded Python 3

File details

Details for the file imgshape-4.1.0.tar.gz.

File metadata

  • Download URL: imgshape-4.1.0.tar.gz
  • Upload date:
  • Size: 70.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for imgshape-4.1.0.tar.gz
Algorithm Hash digest
SHA256 59a03210dbd992a3f197e17d4a27a5eecda4b0c1c684b823b0f2e4253fe05cf9
MD5 8091774f33fcd9856fdf408e26366948
BLAKE2b-256 8031142febd7aae11d020950af3310324681de8a3380a9e17ba9f82c2022bbde

See more details on using hashes here.

File details

Details for the file imgshape-4.1.0-py3-none-any.whl.

File metadata

  • Download URL: imgshape-4.1.0-py3-none-any.whl
  • Upload date:
  • Size: 71.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for imgshape-4.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e2cb16dc521ec5974d081f76cc3ad5c24aceb1f62b2409fdfeb2f9cbaf6731b0
MD5 9e0d308477da1d645477ecd61ce83692
BLAKE2b-256 9cb7e0b8af9d4c5a12795f8c4794241d18f989b02bbcb8a2e1fe1ddb0b9497d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page