Modular vision model inference toolkit for rapid prototyping
Project description
SnapVision
Internal vision toolkit — Solomon 3D
Table of Contents
- Overview
- Features
- Quick Start
- Installation
- Supported Models
- Usage
- Results
- Adding New Models
- Architecture
- Configuration
- Development
- License
Overview
SnapVision is a modular, pip-installable Python toolkit for rapid prototyping with vision models. It standardizes the download → load → predict → save cycle so engineers can go from "I have images" to "results" in minutes, not hours.
Every model follows the same interface: call snap_vision.load(), call .predict(), call .save(). The right weights are fetched automatically from HuggingFace on first use and cached locally. For gated models like SAM3, you can supply local weights instead.
The package is named snap-cv on pip and imported as snap_vision. The CLI command is snap-cv.
Features
- Modular model support via pip extras — install only the backends you need; unused models add zero overhead
- Auto-download with local caching — weights are fetched from HuggingFace or direct URLs on first use and stored in
~/.snap_vision/weights/ - Local weights support for gated models — point
weights_dirat your local copy for models behind a HuggingFace access gate (e.g. SAM3) - Typed results with built-in visualization and save —
DetectionResult,SegmentationResult, andDepthResulteach carry.save(),.to_json(), and.visualize()methods - Three interfaces — Python API, CLI, and Gradio web UI; all share the same underlying adapter layer
- Cross-platform device auto-detection — prefers CUDA, falls back to MPS (Apple Silicon), then CPU
Quick Start
pip install snap-cv[yolo]
import cv2
import snap_vision as sv
model = sv.load("yolov8")
image = cv2.cvtColor(cv2.imread("photo.jpg"), cv2.COLOR_BGR2RGB)
result = model.predict(image, confidence=0.4)
result.save("./output", image=image)
Installation
The core package installs only lightweight dependencies (Click, PyYAML, NumPy, OpenCV, Pillow, HuggingFace Hub). Install model-specific extras only when you need them.
# Core only (no model backends)
pip install snap-cv
# Object detection with YOLOv8
pip install snap-cv[yolo]
# Segment Anything 2
pip install snap-cv[sam2]
# Segment Anything 3 (gated — see Local Weights section)
pip install snap-cv[sam3]
# Open-set detection with text prompts
pip install snap-cv[grounding-dino]
# Monocular depth estimation
pip install snap-cv[depth-anything]
# Gradio web UI
pip install snap-cv[ui]
# Everything
pip install snap-cv[all]
# Development dependencies
pip install snap-cv[dev]
Python requirement: 3.9 or later.
Supported Models
| Model | Task | Extra | Description | Gated |
|---|---|---|---|---|
yolov8 |
detection |
[yolo] |
YOLOv8x by Ultralytics — high-accuracy detection | No |
yolov8n |
detection |
[yolo] |
YOLOv8 Nano — fast, lightweight detection | No |
sam2 |
prompted_segmentation |
[sam2] |
Segment Anything 2 by Meta | No |
sam3 |
prompted_segmentation |
[sam3] |
Segment Anything 3 with native text prompts | Yes |
grounding_dino |
grounded_detection |
[grounding-dino] |
Open-set detection driven by text queries | No |
depth_anything |
depth_estimation |
[depth-anything] |
Depth Anything V2 — monocular depth estimation | No |
Weights are downloaded automatically for non-gated models. See Using Local Weights for gated models.
Usage
Python API
Load and predict — detection
import cv2
import snap_vision as sv
# Load a model (downloads weights on first call, cached afterward)
model = sv.load("yolov8")
image = cv2.cvtColor(cv2.imread("scene.jpg"), cv2.COLOR_BGR2RGB)
result = model.predict(image, confidence=0.35)
print(f"Found {len(result.boxes)} objects")
for box in result.boxes:
print(f" {box.label}: {box.score:.2f} [{box.x1:.0f}, {box.y1:.0f}, {box.x2:.0f}, {box.y2:.0f}]")
# Save result.json + visualization.png under ./output/scene/
result.save("./output/scene", image=image)
Segmentation with SAM2
model = sv.load("sam2")
# Provide point prompts (foreground clicks)
result = model.predict(image, points=[(320, 240), (400, 200)])
# Or provide a bounding box prompt
result = model.predict(image, boxes=[[100, 80, 500, 400]])
print(f"Segmented {len(result.masks)} mask(s), scores: {result.scores}")
result.save("./output/seg", image=image)
Text-prompted detection with GroundingDINO
model = sv.load("grounding_dino")
result = model.predict(image, text_prompt="robotic arm . screw . bracket")
result.save("./output/grounded", image=image)
Depth estimation
model = sv.load("depth_anything")
result = model.predict(image)
import numpy as np
print(f"Depth range: {result.depth_map.min():.2f} – {result.depth_map.max():.2f}")
result.save("./output/depth") # saves depth_map.npy + colorized visualization.png
Force a specific device
model = sv.load("sam2", device="cpu") # force CPU
model = sv.load("yolov8", device="cuda") # force CUDA GPU 0
Visualize without saving
vis = model.visualize(image, result) # returns np.ndarray (H, W, 3)
cv2.imshow("result", vis)
cv2.waitKey(0)
CLI
List all available models
snap-cv list
Model Task Downloaded Description
---------------------------------------------------------------------------------
yolov8 detection yes YOLOv8 object detection by Ultralytics
yolov8n detection no YOLOv8 nano — fast, lightweight detection
sam2 prompted_segmentation no Segment Anything 2 by Meta — prompted segmentation
...
Weight cache: /Users/you/.snap_vision/weights
Filter by task:
snap-cv list --task detection
Pre-download weights
snap-cv download yolov8
snap-cv download sam2
snap-cv download --all # download every registered model
snap-cv download yolov8 --force # re-download even if cached
Run inference
# Single image, print JSON to stdout
snap-cv predict yolov8 --input photo.jpg
# Single image, save structured output
snap-cv predict yolov8 --input photo.jpg --save-dir ./results
# Entire directory of images
snap-cv predict yolov8 --input ./frames/ --save-dir ./results
# With confidence override and display window
snap-cv predict yolov8 --input photo.jpg --confidence 0.5 --show
# GroundingDINO with a text prompt
snap-cv predict grounding_dino --input photo.jpg \
--prompt "robotic arm . screw" --save-dir ./results
# SAM3 with local weights (gated model)
snap-cv predict sam3 --input photo.jpg \
--weights-dir /data/weights/sam3/ --save-dir ./results
# Force a device
snap-cv predict depth_anything --input photo.jpg --device cpu
Remove cached weights
snap-cv remove sam2
Show configuration
snap-cv config --show
Weight cache directory: /Users/you/.snap_vision/weights
Set SNAP_VISION_CACHE_DIR env var to override.
Launch the web UI
snap-cv ui
snap-cv ui --port 8080
snap-cv ui --model yolov8 # pre-load a specific model
Using Local Weights (Gated Models)
SAM3 is hosted on a gated HuggingFace repository (facebook/sam3). You must request access through HuggingFace before weights can be downloaded automatically. Two workflows are supported.
Option A — HuggingFace token (automatic download)
Once your access request is approved, set your token and let SnapVision handle the download:
export HF_TOKEN=hf_your_token_here
snap-cv download sam3
Subsequent calls require no token because weights are cached locally.
Option B — local weights directory
Download the weights yourself through the HuggingFace web interface or huggingface-cli, then point SnapVision at that directory:
# CLI
snap-cv predict sam3 \
--input photo.jpg \
--weights-dir /data/weights/sam3/ \
--save-dir ./results
# Python API
model = sv.load("sam3", weights_dir="/data/weights/sam3/")
result = model.predict(image, text_prompt="the circuit board")
SnapVision expects to find a .safetensors, .pt, or .pth file inside the weights_dir. The directory layout from a standard HuggingFace snapshot download works without modification.
The --weights-dir flag works for any model, not only gated ones. Use it to pin a specific model version or to work entirely offline.
Gradio Web UI
The web UI provides a browser-based interface for interactive prototyping. It requires the [ui] extra.
pip install snap-cv[ui]
snap-cv ui
Open http://localhost:7860 in your browser. The UI exposes:
- An image upload panel
- A model selector showing all registered models
- A confidence slider (0.0 – 1.0)
- A text prompt field (used by GroundingDINO and SAM3)
- A "Run Inference" button
- A side-by-side view of the annotated result image and the raw JSON output
Models are loaded on first use and kept in memory for subsequent requests, so switching between images with the same model is fast.
# Start on a custom port with a pre-loaded model
snap-cv ui --model yolov8 --port 8080
Results
Every model returns a typed result object. All result types share three methods:
| Method | Description |
|---|---|
.save(path, image=None) |
Write results to a directory; creates it if it does not exist |
.to_json() |
Return a JSON string summary of the result |
.visualize(image) |
Return an annotated np.ndarray (H, W, 3) |
DetectionResult
Returned by yolov8, yolov8n, and grounding_dino.
result.boxes # List[Box]
result.boxes[0].x1 # float — left edge in pixels
result.boxes[0].y1 # float — top edge in pixels
result.boxes[0].x2 # float — right edge in pixels
result.boxes[0].y2 # float — bottom edge in pixels
result.boxes[0].label # str — class name or phrase
result.boxes[0].score # float — confidence score
result.boxes[0].width # property: x2 - x1
result.boxes[0].height # property: y2 - y1
Save directory layout:
output/
result.json # serialized boxes with labels and scores
visualization.png # image with bounding boxes drawn
metadata.json # model name, confidence threshold, etc.
SegmentationResult
Returned by sam2 and sam3.
result.masks # List[np.ndarray] — each mask is (H, W) uint8, values 0 or 1
result.scores # List[float] — confidence per mask
result.labels # List[str] — label per mask (may be empty)
Save directory layout:
output/
result.json # mask count, scores, shapes
mask_0.png # first mask as grayscale PNG (0 or 255)
mask_1.png # second mask, and so on
visualization.png # masks overlaid on image with semi-transparent colors
metadata.json # model name, prompts used
DepthResult
Returned by depth_anything.
result.depth_map # np.ndarray (H, W) float32 — relative depth values
Save directory layout:
output/
result.json # shape, min_depth, max_depth
depth_map.npy # raw depth array in NumPy format
visualization.png # depth map colorized with the INFERNO colormap
metadata.json # model name
Adding New Models
Adding a model requires three steps: writing an adapter class, registering the model in model_registry.yaml, and declaring any new pip dependencies in pyproject.toml.
Step 1 — Write an adapter class
Create a file under src/snap_vision/models/. Extend the appropriate base class:
| Task | Base class |
|---|---|
| Detection | DetectionModel |
| Prompted segmentation | PromptedSegmentationModel |
| Text-prompted detection | GroundedDetectionModel |
| Depth estimation | DepthEstimationModel |
Implement three methods: load, predict, and (if the base does not provide one) visualize.
# src/snap_vision/models/my_model.py
from pathlib import Path
from typing import Any
import numpy as np
from snap_vision.core.base import DetectionModel
from snap_vision.core.results import Box, DetectionResult
class MyModelAdapter(DetectionModel):
"""Adapter for MyModel."""
DEPS = ["my_model_package"] # pip package(s) that must be importable
def load(self, weights_dir: Path, device: str) -> None:
from my_model_package import MyModel
weight_files = sorted(weights_dir.glob("*.pt"))
if not weight_files:
raise FileNotFoundError(f"No .pt file found in {weights_dir}")
self.model = MyModel(str(weight_files[0])).to(device)
self.device = device
def predict(
self,
image: np.ndarray,
confidence: float = 0.25,
**kwargs: Any,
) -> DetectionResult:
raw = self.model.infer(image, conf=confidence)
boxes = [
Box(x1=d["x1"], y1=d["y1"], x2=d["x2"], y2=d["y2"],
label=d["label"], score=d["score"])
for d in raw["detections"]
]
return DetectionResult(
boxes=boxes,
metadata={"model": "my_model", "confidence_threshold": confidence},
)
The check_deps() method on SnapModel automatically checks DEPS before loading and raises a helpful ImportError pointing to the correct pip extra.
Step 2 — Register the model
Add an entry to src/snap_vision/model_registry.yaml:
models:
my_model:
task: detection
adapter: snap_vision.models.my_model.MyModelAdapter
deps: [my_model_package]
description: "MyModel — short description for snap-cv list"
weights:
source: huggingface # or "url"
repo_id: org/my-model-repo
files: [model.pt]
requires_token: false
input_schema:
image: required
confidence: optional
default_params:
confidence: 0.25
For direct URL downloads, use source: url, url: https://..., and filename: model.pt instead of repo_id and files.
Step 3 — Add the pip extra
In pyproject.toml, add an entry under [project.optional-dependencies] and add it to all:
[project.optional-dependencies]
my-model = ["my_model_package>=1.0"]
all = [
"snap-cv[sam2]",
"snap-cv[yolo]",
# ...
"snap-cv[my-model]",
]
After these three steps, snap_vision.load("my_model") and snap-cv predict my_model ... work without any further changes.
Architecture
src/snap_vision/
__init__.py # exposes sv.load, DetectionResult, SegmentationResult, DepthResult
cli.py # Click command group: list, download, predict, remove, config, ui
model_registry.yaml # single source of truth for all model metadata
core/
base.py # SnapModel ABC and task-specific base classes
device.py # CUDA / MPS / CPU auto-detection
hub.py # ModelHub.load() — the main user-facing factory
registry.py # ModelRegistry — reads model_registry.yaml into ModelInfo dataclasses
results.py # DetectionResult, SegmentationResult, DepthResult, Box
weights.py # download_weights, is_downloaded, remove_weights, get_cache_dir
models/
yolo.py # YOLOAdapter (Ultralytics)
sam2.py # SAM2Adapter (Meta)
sam3.py # SAM3Adapter (Meta, gated)
grounding_dino.py # GroundingDINOAdapter (IDEA Research)
depth_anything.py # DepthAnythingAdapter (via HuggingFace transformers pipeline)
ui/
app.py # Gradio Blocks interface, launch_ui()
mcp/
__init__.py # (reserved)
Key design decisions:
- Registry-driven — all model metadata lives in
model_registry.yaml. Adding a model never requires changing Python source outside of the adapter file itself. - Lazy imports — model backends (
ultralytics,sam2,transformers, etc.) are imported insideload()andpredict(), so the core package imports cleanly even when extras are absent. A missing dependency produces a clear error pointing to the correctpip installcommand. - Adapter pattern — each model file is self-contained. Swapping or upgrading a backend only touches one file.
- Typed results — rather than returning raw dicts or framework tensors, every model returns a structured result object. This makes downstream code framework-agnostic and provides consistent save/visualize behavior.
- Cache-first weights —
download_weights()is a no-op when weights already exist locally. The--forceflag bypasses this for explicit re-downloads.
Configuration
Cache directory
By default, weights are stored at ~/.snap_vision/weights/<model_name>/. Override this with an environment variable:
export SNAP_VISION_CACHE_DIR=/mnt/ssd/snap_vision_weights
Check the active cache directory:
snap-cv config --show
HuggingFace token
Required for gated models (currently SAM3). Set the standard HuggingFace environment variable:
export HF_TOKEN=hf_your_token_here
SnapVision reads HF_TOKEN automatically when downloading a model whose registry entry has requires_token: true. The token is never written to disk.
Summary
| Variable | Purpose | Default |
|---|---|---|
SNAP_VISION_CACHE_DIR |
Root directory for cached weights | ~/.snap_vision/weights/ |
HF_TOKEN |
HuggingFace access token for gated models | (not set) |
Development
git clone <repo-url>
cd common_tools
# Editable install with dev dependencies
pip install -e ".[dev]"
# Run all tests
pytest
# Run tests with coverage report
pytest --cov=snap_vision --cov-report=term-missing
Tests live in tests/ and cover the core layer (registry, weights, device detection, base classes, and all result types) without requiring any model backend to be installed.
To add tests for a new model adapter, follow the patterns in tests/test_base.py: use a minimal mock subclass that sets DEPS = [] so tests run without the real backend package.
License
MIT. See LICENSE for the full text.
SnapVision is an internal tool maintained by the Solomon 3D vision and robotics team.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snap_cv-0.1.0.tar.gz.
File metadata
- Download URL: snap_cv-0.1.0.tar.gz
- Upload date:
- Size: 25.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31ae4664157905bb1654f9bcae3a34a5b5674ac2ae0a145f060f869b871a802b
|
|
| MD5 |
4d6ccf00797ef1795d2e9bac8449a2bf
|
|
| BLAKE2b-256 |
45dc048147af904ed2511f0ab4117289564750a9bd2856a574990f59515011ff
|
File details
Details for the file snap_cv-0.1.0-py3-none-any.whl.
File metadata
- Download URL: snap_cv-0.1.0-py3-none-any.whl
- Upload date:
- Size: 27.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
084e26b58c1382af447f75c12abfd233be9d6a029699b2957f33981dc00d3565
|
|
| MD5 |
1bde6e117106db7fb7a7bce9a1bf33b8
|
|
| BLAKE2b-256 |
e33030a32583e52bc86886b03df25d3d52a1a56dbbd48b87961fb7b57b845c77
|