Modular vision model inference toolkit for rapid prototyping

These details have not been verified by PyPI

Project description

SnapVision

Internal vision toolkit — Solomon 3D

Overview
Features
Quick Start
Installation
Supported Models
Usage
Results
Adding New Models
Architecture
Configuration
Development
License

Overview

SnapVision is a modular, pip-installable Python toolkit for rapid prototyping with vision models. It standardizes the download → load → predict → save cycle so engineers can go from "I have images" to "results" in minutes, not hours.

Every model follows the same interface: call snap_vision.load(), call .predict(), call .save(). The right weights are fetched automatically from HuggingFace on first use and cached locally. For gated models like SAM3, you can supply local weights instead.

The package is named snap-cv on pip and imported as snap_vision. The CLI command is snap-cv.

Features

Modular model support via pip extras — install only the backends you need; unused models add zero overhead
Auto-download with local caching — weights are fetched from HuggingFace or direct URLs on first use and stored in ~/.snap_vision/weights/
Local weights support for gated models — point weights_dir at your local copy for models behind a HuggingFace access gate (e.g. SAM3)
Typed results with built-in visualization and save — DetectionResult, SegmentationResult, and DepthResult each carry .save(), .to_json(), and .visualize() methods
Three interfaces — Python API, CLI, and Gradio web UI; all share the same underlying adapter layer
Cross-platform device auto-detection — prefers CUDA, falls back to MPS (Apple Silicon), then CPU

Quick Start

pip install snap-cv[yolo]

import cv2
import snap_vision as sv

model = sv.load("yolov8")
image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB)
result = model.predict(image, confidence=0.4)
result.save("./output", image=image)

Installation

The core package installs only lightweight dependencies (Click, PyYAML, NumPy, OpenCV, Pillow, HuggingFace Hub). Install model-specific extras only when you need them.

# Core only (no model backends)
pip install snap-cv

# Object detection with YOLOv8
pip install snap-cv[yolo]

# Segment Anything 2
pip install snap-cv[sam2]

# Segment Anything 3 (gated — see Local Weights section)
pip install snap-cv[sam3]

# Open-set detection with text prompts
pip install snap-cv[grounding-dino]

# Monocular depth estimation
pip install snap-cv[depth-anything]

# Gradio web UI
pip install snap-cv[ui]

# Everything
pip install snap-cv[all]

# Development dependencies
pip install snap-cv[dev]

Python requirement: 3.9 or later.

Supported Models

Model	Task	Extra	Description	Gated
`yolov8`	`detection`	`[yolo]`	YOLOv8x by Ultralytics — high-accuracy detection	No
`yolov8n`	`detection`	`[yolo]`	YOLOv8 Nano — fast, lightweight detection	No
`sam2`	`prompted_segmentation`	`[sam2]`	Segment Anything 2 by Meta	No
`sam3`	`prompted_segmentation`	`[sam3]`	Segment Anything 3 with native text prompts	Yes
`grounding_dino`	`grounded_detection`	`[grounding-dino]`	Open-set detection driven by text queries	No
`depth_anything`	`depth_estimation`	`[depth-anything]`	Depth Anything V2 — monocular depth estimation	No

Weights are downloaded automatically for non-gated models. See Using Local Weights for gated models.

Usage

Python API

Load and predict — detection

import cv2
import snap_vision as sv

# Load a model (downloads weights on first call, cached afterward)
model = sv.load("yolov8")

image = cv2.cvtColor(cv2.imread("scene.jpg"), cv2.COLOR_BGR2RGB)
result = model.predict(image, confidence=0.35)

print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
    print(f"  {box.label}: {box.score:.2f}  [{box.x1:.0f}, {box.y1:.0f}, {box.x2:.0f}, {box.y2:.0f}]")

# Save result.json + visualization.png under ./output/scene/
result.save("./output/scene", image=image)

Segmentation with SAM2

model = sv.load("sam2")

# Provide point prompts (foreground clicks)
result = model.predict(image, points=[(320, 240), (400, 200)])

# Or provide a bounding box prompt
result = model.predict(image, boxes=[[100, 80, 500, 400]])

print(f"Segmented {len(result.masks)} mask(s), scores: {result.scores}")
result.save("./output/seg", image=image)

Text-prompted detection with GroundingDINO

model = sv.load("grounding_dino")
result = model.predict(image, text_prompt="robotic arm . screw . bracket")
result.save("./output/grounded", image=image)

Depth estimation

model = sv.load("depth_anything")
result = model.predict(image)

import numpy as np
print(f"Depth range: {result.depth_map.min():.2f} – {result.depth_map.max():.2f}")
result.save("./output/depth")  # saves depth_map.npy + colorized visualization.png

Force a specific device

model = sv.load("sam2", device="cpu")       # force CPU
model = sv.load("yolov8", device="cuda")    # force CUDA GPU 0

Visualize without saving

vis = model.visualize(image, result)        # returns np.ndarray (H, W, 3)
cv2.imshow("result", vis)
cv2.waitKey(0)

CLI

List all available models

snap-cv list

Model                Task                      Downloaded   Description
---------------------------------------------------------------------------------
yolov8               detection                 yes          YOLOv8 object detection by Ultralytics
yolov8n              detection                 no           YOLOv8 nano — fast, lightweight detection
sam2                 prompted_segmentation     no           Segment Anything 2 by Meta — prompted segmentation
...

Weight cache: /Users/you/.snap_vision/weights

Filter by task:

snap-cv list --task detection

Pre-download weights

snap-cv download yolov8
snap-cv download sam2
snap-cv download --all          # download every registered model
snap-cv download yolov8 --force # re-download even if cached

Run inference

# Single image, print JSON to stdout
snap-cv predict yolov8 --input photo.jpg

# Single image, save structured output
snap-cv predict yolov8 --input photo.jpg --save-dir ./results

# Entire directory of images
snap-cv predict yolov8 --input ./frames/ --save-dir ./results

# With confidence override and display window
snap-cv predict yolov8 --input photo.jpg --confidence 0.5 --show

# GroundingDINO with a text prompt
snap-cv predict grounding_dino --input photo.jpg \
    --prompt "robotic arm . screw" --save-dir ./results

# SAM3 with local weights (gated model)
snap-cv predict sam3 --input photo.jpg \
    --weights-dir /data/weights/sam3/ --save-dir ./results

# Force a device
snap-cv predict depth_anything --input photo.jpg --device cpu

Remove cached weights

snap-cv remove sam2

Show configuration

snap-cv config --show

Weight cache directory: /Users/you/.snap_vision/weights
Set SNAP_VISION_CACHE_DIR env var to override.

Launch the web UI

snap-cv ui
snap-cv ui --port 8080
snap-cv ui --model yolov8      # pre-load a specific model

Using Local Weights (Gated Models)

SAM3 is hosted on a gated HuggingFace repository (facebook/sam3). You must request access through HuggingFace before weights can be downloaded automatically. Two workflows are supported.

Option A — HuggingFace token (automatic download)

Once your access request is approved, set your token and let SnapVision handle the download:

export HF_TOKEN=hf_your_token_here
snap-cv download sam3

Subsequent calls require no token because weights are cached locally.

Option B — local weights directory

Download the weights yourself through the HuggingFace web interface or huggingface-cli, then point SnapVision at that directory:

# CLI
snap-cv predict sam3 \
    --input photo.jpg \
    --weights-dir /data/weights/sam3/ \
    --save-dir ./results

# Python API
model = sv.load("sam3", weights_dir="/data/weights/sam3/")
result = model.predict(image, text_prompt="the circuit board")

SnapVision expects to find a .safetensors, .pt, or .pth file inside the weights_dir. The directory layout from a standard HuggingFace snapshot download works without modification.

The --weights-dir flag works for any model, not only gated ones. Use it to pin a specific model version or to work entirely offline.

Gradio Web UI

The web UI provides a browser-based interface for interactive prototyping. It requires the [ui] extra.

pip install snap-cv[ui]
snap-cv ui

Open http://localhost:7860 in your browser. The UI exposes:

An image upload panel
A model selector showing all registered models
A confidence slider (0.0 – 1.0)
A text prompt field (used by GroundingDINO and SAM3)
A "Run Inference" button
A side-by-side view of the annotated result image and the raw JSON output

Models are loaded on first use and kept in memory for subsequent requests, so switching between images with the same model is fast.

# Start on a custom port with a pre-loaded model
snap-cv ui --model yolov8 --port 8080

Results

Every model returns a typed result object. All result types share three methods:

Method	Description
`.save(path, image=None)`	Write results to a directory; creates it if it does not exist
`.to_json()`	Return a JSON string summary of the result
`.visualize(image)`	Return an annotated `np.ndarray` (H, W, 3)

DetectionResult

Returned by yolov8, yolov8n, and grounding_dino.

result.boxes          # List[Box]
result.boxes[0].x1    # float — left edge in pixels
result.boxes[0].y1    # float — top edge in pixels
result.boxes[0].x2    # float — right edge in pixels
result.boxes[0].y2    # float — bottom edge in pixels
result.boxes[0].label # str — class name or phrase
result.boxes[0].score # float — confidence score
result.boxes[0].width  # property: x2 - x1
result.boxes[0].height # property: y2 - y1

Save directory layout:

output/
  result.json          # serialized boxes with labels and scores
  visualization.png    # image with bounding boxes drawn
  metadata.json        # model name, confidence threshold, etc.

SegmentationResult

Returned by sam2 and sam3.

result.masks      # List[np.ndarray] — each mask is (H, W) uint8, values 0 or 1
result.scores     # List[float] — confidence per mask
result.labels     # List[str] — label per mask (may be empty)

Save directory layout:

output/
  result.json          # mask count, scores, shapes
  mask_0.png           # first mask as grayscale PNG (0 or 255)
  mask_1.png           # second mask, and so on
  visualization.png    # masks overlaid on image with semi-transparent colors
  metadata.json        # model name, prompts used

DepthResult

Returned by depth_anything.

result.depth_map   # np.ndarray (H, W) float32 — relative depth values

Save directory layout:

output/
  result.json          # shape, min_depth, max_depth
  depth_map.npy        # raw depth array in NumPy format
  visualization.png    # depth map colorized with the INFERNO colormap
  metadata.json        # model name

Adding New Models

Adding a model requires three steps: writing an adapter class, registering the model in model_registry.yaml, and declaring any new pip dependencies in pyproject.toml.

Step 1 — Write an adapter class

Create a file under src/snap_vision/models/. Extend the appropriate base class:

Task	Base class
Detection	`DetectionModel`
Prompted segmentation	`PromptedSegmentationModel`
Text-prompted detection	`GroundedDetectionModel`
Depth estimation	`DepthEstimationModel`

Implement three methods: load, predict, and (if the base does not provide one) visualize.

# src/snap_vision/models/my_model.py

from pathlib import Path
from typing import Any
import numpy as np

from snap_vision.core.base import DetectionModel
from snap_vision.core.results import Box, DetectionResult


class MyModelAdapter(DetectionModel):
    """Adapter for MyModel."""

    DEPS = ["my_model_package"]   # pip package(s) that must be importable

    def load(self, weights_dir: Path, device: str) -> None:
        from my_model_package import MyModel

        weight_files = sorted(weights_dir.glob("*.pt"))
        if not weight_files:
            raise FileNotFoundError(f"No .pt file found in {weights_dir}")

        self.model = MyModel(str(weight_files[0])).to(device)
        self.device = device

    def predict(
        self,
        image: np.ndarray,
        confidence: float = 0.25,
        **kwargs: Any,
    ) -> DetectionResult:
        raw = self.model.infer(image, conf=confidence)

        boxes = [
            Box(x1=d["x1"], y1=d["y1"], x2=d["x2"], y2=d["y2"],
                label=d["label"], score=d["score"])
            for d in raw["detections"]
        ]
        return DetectionResult(
            boxes=boxes,
            metadata={"model": "my_model", "confidence_threshold": confidence},
        )

The check_deps() method on SnapModel automatically checks DEPS before loading and raises a helpful ImportError pointing to the correct pip extra.

Step 2 — Register the model

Add an entry to src/snap_vision/model_registry.yaml:

models:
  my_model:
    task: detection
    adapter: snap_vision.models.my_model.MyModelAdapter
    deps: [my_model_package]
    description: "MyModel — short description for snap-cv list"
    weights:
      source: huggingface          # or "url"
      repo_id: org/my-model-repo
      files: [model.pt]
      requires_token: false
    input_schema:
      image: required
      confidence: optional
    default_params:
      confidence: 0.25

For direct URL downloads, use source: url, url: https://..., and filename: model.pt instead of repo_id and files.

Step 3 — Add the pip extra

In pyproject.toml, add an entry under [project.optional-dependencies] and add it to all:

[project.optional-dependencies]
my-model = ["my_model_package>=1.0"]
all = [
    "snap-cv[sam2]",
    "snap-cv[yolo]",
    # ...
    "snap-cv[my-model]",
]

After these three steps, snap_vision.load("my_model") and snap-cv predict my_model ... work without any further changes.

Architecture

src/snap_vision/
  __init__.py              # exposes sv.load, DetectionResult, SegmentationResult, DepthResult
  cli.py                   # Click command group: list, download, predict, remove, config, ui
  model_registry.yaml      # single source of truth for all model metadata
  core/
    base.py                # SnapModel ABC and task-specific base classes
    device.py              # CUDA / MPS / CPU auto-detection
    hub.py                 # ModelHub.load() — the main user-facing factory
    registry.py            # ModelRegistry — reads model_registry.yaml into ModelInfo dataclasses
    results.py             # DetectionResult, SegmentationResult, DepthResult, Box
    weights.py             # download_weights, is_downloaded, remove_weights, get_cache_dir
  models/
    yolo.py                # YOLOAdapter (Ultralytics)
    sam2.py                # SAM2Adapter (Meta)
    sam3.py                # SAM3Adapter (Meta, gated)
    grounding_dino.py      # GroundingDINOAdapter (IDEA Research)
    depth_anything.py      # DepthAnythingAdapter (via HuggingFace transformers pipeline)
  ui/
    app.py                 # Gradio Blocks interface, launch_ui()
  mcp/
    __init__.py            # (reserved)

Key design decisions:

Registry-driven — all model metadata lives in model_registry.yaml. Adding a model never requires changing Python source outside of the adapter file itself.
Lazy imports — model backends (ultralytics, sam2, transformers, etc.) are imported inside load() and predict(), so the core package imports cleanly even when extras are absent. A missing dependency produces a clear error pointing to the correct pip install command.
Adapter pattern — each model file is self-contained. Swapping or upgrading a backend only touches one file.
Typed results — rather than returning raw dicts or framework tensors, every model returns a structured result object. This makes downstream code framework-agnostic and provides consistent save/visualize behavior.
Cache-first weights — download_weights() is a no-op when weights already exist locally. The --force flag bypasses this for explicit re-downloads.

Configuration

Cache directory

By default, weights are stored at ~/.snap_vision/weights/<model_name>/. Override this with an environment variable:

export SNAP_VISION_CACHE_DIR=/mnt/ssd/snap_vision_weights

Check the active cache directory:

snap-cv config --show

HuggingFace token

Required for gated models (currently SAM3). Set the standard HuggingFace environment variable:

export HF_TOKEN=hf_your_token_here

SnapVision reads HF_TOKEN automatically when downloading a model whose registry entry has requires_token: true. The token is never written to disk.

Summary

Variable	Purpose	Default
`SNAP_VISION_CACHE_DIR`	Root directory for cached weights	`~/.snap_vision/weights/`
`HF_TOKEN`	HuggingFace access token for gated models	(not set)

Development

git clone <repo-url>
cd common_tools

# Editable install with dev dependencies
pip install -e ".[dev]"

# Run all tests
pytest

# Run tests with coverage report
pytest --cov=snap_vision --cov-report=term-missing

Tests live in tests/ and cover the core layer (registry, weights, device detection, base classes, and all result types) without requiring any model backend to be installed.

To add tests for a new model adapter, follow the patterns in tests/test_base.py: use a minimal mock subclass that sets DEPS = [] so tests run without the real backend package.

License

MIT. See LICENSE for the full text.

SnapVision is an internal tool maintained by the Solomon 3D vision and robotics team.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.1

Apr 2, 2026

This version

0.1.0

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snap_cv-0.1.0.tar.gz (25.3 kB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

snap_cv-0.1.0-py3-none-any.whl (27.3 kB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file snap_cv-0.1.0.tar.gz.

File metadata

Download URL: snap_cv-0.1.0.tar.gz
Upload date: Apr 2, 2026
Size: 25.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for snap_cv-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`31ae4664157905bb1654f9bcae3a34a5b5674ac2ae0a145f060f869b871a802b`
MD5	`4d6ccf00797ef1795d2e9bac8449a2bf`
BLAKE2b-256	`45dc048147af904ed2511f0ab4117289564750a9bd2856a574990f59515011ff`

See more details on using hashes here.

File details

Details for the file snap_cv-0.1.0-py3-none-any.whl.

File metadata

Download URL: snap_cv-0.1.0-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 27.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for snap_cv-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`084e26b58c1382af447f75c12abfd233be9d6a029699b2957f33981dc00d3565`
MD5	`1bde6e117106db7fb7a7bce9a1bf33b8`
BLAKE2b-256	`e33030a32583e52bc86886b03df25d3d52a1a56dbbd48b87961fb7b57b845c77`

See more details on using hashes here.

snap-cv 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

SnapVision

Table of Contents

Overview

Features

Quick Start

Installation

Supported Models

Usage

Python API

CLI

Using Local Weights (Gated Models)

Gradio Web UI

Results

DetectionResult

SegmentationResult

DepthResult

Adding New Models

Step 1 — Write an adapter class

Step 2 — Register the model

Step 3 — Add the pip extra

Architecture

Configuration

Cache directory

HuggingFace token

Summary

Development

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes