Skip to main content

Modular vision model inference toolkit for rapid prototyping

Project description

SnapVision

Internal vision toolkit — Solomon 3D

Table of Contents


Overview

SnapVision is a modular, pip-installable Python toolkit for rapid prototyping with vision models. It standardizes the download → load → predict → save cycle so engineers can go from "I have images" to "results" in minutes, not hours.

Every model follows the same interface: call snap_vision.load(), call .predict(), call .save(). The right weights are fetched automatically from HuggingFace on first use and cached locally. For gated models like SAM3, you can supply local weights instead.

The package is named snap-cv on pip and imported as snap_vision. The CLI command is snap-cv.


Features

  • Modular model support via pip extras — install only the backends you need; unused models add zero overhead
  • Auto-download with local caching — weights are fetched from HuggingFace or direct URLs on first use and stored in ~/.snap_vision/weights/
  • Local weights support for gated models — point weights_dir at your local copy for models behind a HuggingFace access gate (e.g. SAM3)
  • Typed results with built-in visualization and saveDetectionResult, SegmentationResult, and DepthResult each carry .save(), .to_json(), and .visualize() methods
  • Three interfaces — Python API, CLI, and Gradio web UI; all share the same underlying adapter layer
  • Cross-platform device auto-detection — prefers CUDA, falls back to MPS (Apple Silicon), then CPU

Quick Start

pip install snap-cv[yolo]
import cv2
import snap_vision as sv

model = sv.load("yolov8")
image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB)
result = model.predict(image, confidence=0.4)
result.save("./output", image=image)

Installation

The core package installs only lightweight dependencies (Click, PyYAML, NumPy, OpenCV, Pillow, HuggingFace Hub). Install model-specific extras only when you need them.

# Core only (no model backends)
pip install snap-cv

# Object detection with YOLOv8
pip install snap-cv[yolo]

# Segment Anything 2
pip install snap-cv[sam2]

# Segment Anything 3 (gated — see Local Weights section)
pip install snap-cv[sam3]

# Open-set detection with text prompts
pip install snap-cv[grounding-dino]

# Monocular depth estimation
pip install snap-cv[depth-anything]

# Gradio web UI
pip install snap-cv[ui]

# Everything
pip install snap-cv[all]

# Development dependencies
pip install snap-cv[dev]

Python requirement: 3.9 or later.


Supported Models

Model Task Extra Description Gated
yolov8 detection [yolo] YOLOv8x by Ultralytics — high-accuracy detection No
yolov8n detection [yolo] YOLOv8 Nano — fast, lightweight detection No
sam2 prompted_segmentation [sam2] Segment Anything 2 by Meta No
sam3 prompted_segmentation [sam3] Segment Anything 3 with native text prompts Yes
grounding_dino grounded_detection [grounding-dino] Open-set detection driven by text queries No
depth_anything depth_estimation [depth-anything] Depth Anything V2 — monocular depth estimation No

Weights are downloaded automatically for non-gated models. See Using Local Weights for gated models.


Usage

Python API

Load and predict — detection

import cv2
import snap_vision as sv

# Load a model (downloads weights on first call, cached afterward)
model = sv.load("yolov8")

image = cv2.cvtColor(cv2.imread("scene.jpg"), cv2.COLOR_BGR2RGB)
result = model.predict(image, confidence=0.35)

print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
    print(f"  {box.label}: {box.score:.2f}  [{box.x1:.0f}, {box.y1:.0f}, {box.x2:.0f}, {box.y2:.0f}]")

# Save result.json + visualization.png under ./output/scene/
result.save("./output/scene", image=image)

Segmentation with SAM2

model = sv.load("sam2")

# Provide point prompts (foreground clicks)
result = model.predict(image, points=[(320, 240), (400, 200)])

# Or provide a bounding box prompt
result = model.predict(image, boxes=[[100, 80, 500, 400]])

print(f"Segmented {len(result.masks)} mask(s), scores: {result.scores}")
result.save("./output/seg", image=image)

Text-prompted detection with GroundingDINO

model = sv.load("grounding_dino")
result = model.predict(image, text_prompt="robotic arm . screw . bracket")
result.save("./output/grounded", image=image)

Depth estimation

model = sv.load("depth_anything")
result = model.predict(image)

import numpy as np
print(f"Depth range: {result.depth_map.min():.2f}{result.depth_map.max():.2f}")
result.save("./output/depth")  # saves depth_map.npy + colorized visualization.png

Force a specific device

model = sv.load("sam2", device="cpu")       # force CPU
model = sv.load("yolov8", device="cuda")    # force CUDA GPU 0

Visualize without saving

vis = model.visualize(image, result)        # returns np.ndarray (H, W, 3)
cv2.imshow("result", vis)
cv2.waitKey(0)

CLI

List all available models

snap-cv list
Model                Task                      Downloaded   Description
---------------------------------------------------------------------------------
yolov8               detection                 yes          YOLOv8 object detection by Ultralytics
yolov8n              detection                 no           YOLOv8 nano — fast, lightweight detection
sam2                 prompted_segmentation     no           Segment Anything 2 by Meta — prompted segmentation
...

Weight cache: /Users/you/.snap_vision/weights

Filter by task:

snap-cv list --task detection

Pre-download weights

snap-cv download yolov8
snap-cv download sam2
snap-cv download --all          # download every registered model
snap-cv download yolov8 --force # re-download even if cached

Run inference

# Single image, print JSON to stdout
snap-cv predict yolov8 --input photo.jpg

# Single image, save structured output
snap-cv predict yolov8 --input photo.jpg --save-dir ./results

# Entire directory of images
snap-cv predict yolov8 --input ./frames/ --save-dir ./results

# With confidence override and display window
snap-cv predict yolov8 --input photo.jpg --confidence 0.5 --show

# GroundingDINO with a text prompt
snap-cv predict grounding_dino --input photo.jpg \
    --prompt "robotic arm . screw" --save-dir ./results

# SAM3 with local weights (gated model)
snap-cv predict sam3 --input photo.jpg \
    --weights-dir /data/weights/sam3/ --save-dir ./results

# Force a device
snap-cv predict depth_anything --input photo.jpg --device cpu

Remove cached weights

snap-cv remove sam2

Show configuration

snap-cv config --show
Weight cache directory: /Users/you/.snap_vision/weights
Set SNAP_VISION_CACHE_DIR env var to override.

Launch the web UI

snap-cv ui
snap-cv ui --port 8080
snap-cv ui --model yolov8      # pre-load a specific model

Using Local Weights (Gated Models)

SAM3 is hosted on a gated HuggingFace repository (facebook/sam3). You must request access through HuggingFace before weights can be downloaded automatically. Two workflows are supported.

Option A — HuggingFace token (automatic download)

Once your access request is approved, set your token and let SnapVision handle the download:

export HF_TOKEN=hf_your_token_here
snap-cv download sam3

Subsequent calls require no token because weights are cached locally.

Option B — local weights directory

Download the weights yourself through the HuggingFace web interface or huggingface-cli, then point SnapVision at that directory:

# CLI
snap-cv predict sam3 \
    --input photo.jpg \
    --weights-dir /data/weights/sam3/ \
    --save-dir ./results
# Python API
model = sv.load("sam3", weights_dir="/data/weights/sam3/")
result = model.predict(image, text_prompt="the circuit board")

SnapVision expects to find a .safetensors, .pt, or .pth file inside the weights_dir. The directory layout from a standard HuggingFace snapshot download works without modification.

The --weights-dir flag works for any model, not only gated ones. Use it to pin a specific model version or to work entirely offline.


Gradio Web UI

The web UI provides a browser-based interface for interactive prototyping. It requires the [ui] extra.

pip install snap-cv[ui]
snap-cv ui

Open http://localhost:7860 in your browser. The UI exposes:

  • An image upload panel
  • A model selector showing all registered models
  • A confidence slider (0.0 – 1.0)
  • A text prompt field (used by GroundingDINO and SAM3)
  • A "Run Inference" button
  • A side-by-side view of the annotated result image and the raw JSON output

Models are loaded on first use and kept in memory for subsequent requests, so switching between images with the same model is fast.

# Start on a custom port with a pre-loaded model
snap-cv ui --model yolov8 --port 8080

Results

Every model returns a typed result object. All result types share three methods:

Method Description
.save(path, image=None) Write results to a directory; creates it if it does not exist
.to_json() Return a JSON string summary of the result
.visualize(image) Return an annotated np.ndarray (H, W, 3)

DetectionResult

Returned by yolov8, yolov8n, and grounding_dino.

result.boxes          # List[Box]
result.boxes[0].x1    # float — left edge in pixels
result.boxes[0].y1    # float — top edge in pixels
result.boxes[0].x2    # float — right edge in pixels
result.boxes[0].y2    # float — bottom edge in pixels
result.boxes[0].label # str — class name or phrase
result.boxes[0].score # float — confidence score
result.boxes[0].width  # property: x2 - x1
result.boxes[0].height # property: y2 - y1

Save directory layout:

output/
  result.json          # serialized boxes with labels and scores
  visualization.png    # image with bounding boxes drawn
  metadata.json        # model name, confidence threshold, etc.

SegmentationResult

Returned by sam2 and sam3.

result.masks      # List[np.ndarray] — each mask is (H, W) uint8, values 0 or 1
result.scores     # List[float] — confidence per mask
result.labels     # List[str] — label per mask (may be empty)

Save directory layout:

output/
  result.json          # mask count, scores, shapes
  mask_0.png           # first mask as grayscale PNG (0 or 255)
  mask_1.png           # second mask, and so on
  visualization.png    # masks overlaid on image with semi-transparent colors
  metadata.json        # model name, prompts used

DepthResult

Returned by depth_anything.

result.depth_map   # np.ndarray (H, W) float32 — relative depth values

Save directory layout:

output/
  result.json          # shape, min_depth, max_depth
  depth_map.npy        # raw depth array in NumPy format
  visualization.png    # depth map colorized with the INFERNO colormap
  metadata.json        # model name

Adding New Models

Adding a model requires three steps: writing an adapter class, registering the model in model_registry.yaml, and declaring any new pip dependencies in pyproject.toml.

Step 1 — Write an adapter class

Create a file under src/snap_vision/models/. Extend the appropriate base class:

Task Base class
Detection DetectionModel
Prompted segmentation PromptedSegmentationModel
Text-prompted detection GroundedDetectionModel
Depth estimation DepthEstimationModel

Implement three methods: load, predict, and (if the base does not provide one) visualize.

# src/snap_vision/models/my_model.py

from pathlib import Path
from typing import Any
import numpy as np

from snap_vision.core.base import DetectionModel
from snap_vision.core.results import Box, DetectionResult


class MyModelAdapter(DetectionModel):
    """Adapter for MyModel."""

    DEPS = ["my_model_package"]   # pip package(s) that must be importable

    def load(self, weights_dir: Path, device: str) -> None:
        from my_model_package import MyModel

        weight_files = sorted(weights_dir.glob("*.pt"))
        if not weight_files:
            raise FileNotFoundError(f"No .pt file found in {weights_dir}")

        self.model = MyModel(str(weight_files[0])).to(device)
        self.device = device

    def predict(
        self,
        image: np.ndarray,
        confidence: float = 0.25,
        **kwargs: Any,
    ) -> DetectionResult:
        raw = self.model.infer(image, conf=confidence)

        boxes = [
            Box(x1=d["x1"], y1=d["y1"], x2=d["x2"], y2=d["y2"],
                label=d["label"], score=d["score"])
            for d in raw["detections"]
        ]
        return DetectionResult(
            boxes=boxes,
            metadata={"model": "my_model", "confidence_threshold": confidence},
        )

The check_deps() method on SnapModel automatically checks DEPS before loading and raises a helpful ImportError pointing to the correct pip extra.

Step 2 — Register the model

Add an entry to src/snap_vision/model_registry.yaml:

models:
  my_model:
    task: detection
    adapter: snap_vision.models.my_model.MyModelAdapter
    deps: [my_model_package]
    description: "MyModel  short description for snap-cv list"
    weights:
      source: huggingface          # or "url"
      repo_id: org/my-model-repo
      files: [model.pt]
      requires_token: false
    input_schema:
      image: required
      confidence: optional
    default_params:
      confidence: 0.25

For direct URL downloads, use source: url, url: https://..., and filename: model.pt instead of repo_id and files.

Step 3 — Add the pip extra

In pyproject.toml, add an entry under [project.optional-dependencies] and add it to all:

[project.optional-dependencies]
my-model = ["my_model_package>=1.0"]
all = [
    "snap-cv[sam2]",
    "snap-cv[yolo]",
    # ...
    "snap-cv[my-model]",
]

After these three steps, snap_vision.load("my_model") and snap-cv predict my_model ... work without any further changes.


Architecture

src/snap_vision/
  __init__.py              # exposes sv.load, DetectionResult, SegmentationResult, DepthResult
  cli.py                   # Click command group: list, download, predict, remove, config, ui
  model_registry.yaml      # single source of truth for all model metadata
  core/
    base.py                # SnapModel ABC and task-specific base classes
    device.py              # CUDA / MPS / CPU auto-detection
    hub.py                 # ModelHub.load() — the main user-facing factory
    registry.py            # ModelRegistry — reads model_registry.yaml into ModelInfo dataclasses
    results.py             # DetectionResult, SegmentationResult, DepthResult, Box
    weights.py             # download_weights, is_downloaded, remove_weights, get_cache_dir
  models/
    yolo.py                # YOLOAdapter (Ultralytics)
    sam2.py                # SAM2Adapter (Meta)
    sam3.py                # SAM3Adapter (Meta, gated)
    grounding_dino.py      # GroundingDINOAdapter (IDEA Research)
    depth_anything.py      # DepthAnythingAdapter (via HuggingFace transformers pipeline)
  ui/
    app.py                 # Gradio Blocks interface, launch_ui()
  mcp/
    __init__.py            # (reserved)

Key design decisions:

  • Registry-driven — all model metadata lives in model_registry.yaml. Adding a model never requires changing Python source outside of the adapter file itself.
  • Lazy imports — model backends (ultralytics, sam2, transformers, etc.) are imported inside load() and predict(), so the core package imports cleanly even when extras are absent. A missing dependency produces a clear error pointing to the correct pip install command.
  • Adapter pattern — each model file is self-contained. Swapping or upgrading a backend only touches one file.
  • Typed results — rather than returning raw dicts or framework tensors, every model returns a structured result object. This makes downstream code framework-agnostic and provides consistent save/visualize behavior.
  • Cache-first weightsdownload_weights() is a no-op when weights already exist locally. The --force flag bypasses this for explicit re-downloads.

Configuration

Cache directory

By default, weights are stored at ~/.snap_vision/weights/<model_name>/. Override this with an environment variable:

export SNAP_VISION_CACHE_DIR=/mnt/ssd/snap_vision_weights

Check the active cache directory:

snap-cv config --show

HuggingFace token

Required for gated models (currently SAM3). Set the standard HuggingFace environment variable:

export HF_TOKEN=hf_your_token_here

SnapVision reads HF_TOKEN automatically when downloading a model whose registry entry has requires_token: true. The token is never written to disk.

Summary

Variable Purpose Default
SNAP_VISION_CACHE_DIR Root directory for cached weights ~/.snap_vision/weights/
HF_TOKEN HuggingFace access token for gated models (not set)

Development

git clone <repo-url>
cd common_tools

# Editable install with dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run tests with coverage report
pytest --cov=snap_vision --cov-report=term-missing

Tests live in tests/ and cover the core layer (registry, weights, device detection, base classes, and all result types) without requiring any model backend to be installed.

To add tests for a new model adapter, follow the patterns in tests/test_base.py: use a minimal mock subclass that sets DEPS = [] so tests run without the real backend package.


License

MIT. See LICENSE for the full text.


SnapVision is an internal tool maintained by the Solomon 3D vision and robotics team.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snap_cv-0.1.1.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snap_cv-0.1.1-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file snap_cv-0.1.1.tar.gz.

File metadata

  • Download URL: snap_cv-0.1.1.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for snap_cv-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8e16841d6b7f9a6f928f7e5eb3c7490808794f2fbe044c94d9b661b06292a5f6
MD5 a23f4176a59c76fa1a56ce11a5191f14
BLAKE2b-256 2744f5cba3ea115d552dc173cf6b5898ee0c9c44611630ce4c92ddf3abf2db47

See more details on using hashes here.

File details

Details for the file snap_cv-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: snap_cv-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for snap_cv-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 db84e358747acd9a052e4216d7acc821c3e54ea90fb72209d906840b728c8e71
MD5 964a4f3677ce23591bb80567b8cf3954
BLAKE2b-256 2d3aa18f6e55fcace6d5bb7d095c22cad7997916f13796c0140caa55cd976653

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page