Track objects in videos using SAM3 (Segment Anything Model 3).
Project description
sam-track
A uv-native CLI for tracking objects in videos using SAM3 (Segment Anything Model 3).
Features
- Three prompt modes: Track by text description, ROI polygons, or SLEAP pose keypoints
- Three output formats: Bounding boxes (JSON), segmentation masks (HDF5), tracked poses (SLP)
- Memory-efficient: Streaming mode processes videos frame-by-frame
- SLEAP integration: Link untracked pose predictions to consistent identities
Installation
Requires Python 3.12+ and uv.
As a uv tool (recommended)
# Linux/Windows with NVIDIA GPU (CUDA 13.0)
uv tool install sam-track --prerelease=allow --with "tokenizers==0.22.1" --index https://download.pytorch.org/whl/cu130 --index-strategy unsafe-best-match
# macOS with Apple Silicon (MPS)
uv tool install sam-track --prerelease=allow --with "tokenizers==0.22.1"
After installation, sam-track is available globally.
Ad-hoc with uvx
# Linux/Windows with NVIDIA GPU
uvx --prerelease=allow --with "tokenizers==0.22.1" --index https://download.pytorch.org/whl/cu130 --index-strategy unsafe-best-match sam-track --help
# macOS with Apple Silicon
uvx --prerelease=allow --with "tokenizers==0.22.1" sam-track --help
From source
git clone https://github.com/talmolab/sam-track && cd sam-track
uv sync
uv run sam-track --help
GPU Requirements
| Platform | Requirement |
|---|---|
| Linux | NVIDIA driver 580.65.06+ (CUDA 13.0) |
| Windows | NVIDIA driver 580.65+ (CUDA 13.0) |
| macOS | Apple Silicon (MPS, no driver needed) |
Check your setup:
sam-track system
First-time Setup
SAM3 is a gated model requiring HuggingFace authentication.
1. Check status:
sam-track auth
2. If not authenticated, create a token:
- Go to https://huggingface.co/settings/tokens
- Click Create new token
- Name it
sam-track, select Read permission - Login:
sam-track auth --token hf_...
3. If no model access, request it:
- Go to https://huggingface.co/facebook/sam3
- Fill out the access request form
- Run
sam-track authagain to verify
Quick Start
# Track a mouse by text description, output bounding boxes
sam-track track video.mp4 --text "mouse" --bbox
# Track from ROI annotations, output masks
sam-track track video.mp4 --roi annotations.yml --seg
# Track from SLEAP poses, output tracked SLP
sam-track track video.mp4 --pose labels.slp --slp
Prompting Modes
sam-track supports three ways to specify what to track:
Text Prompts (--text)
Track objects by natural language description. SAM3 detects matching objects in the first frame and tracks them through the video.
# Track a single object type
sam-track track video.mp4 --text "mouse" --bbox
# Track with description
sam-track track video.mp4 --text "black mouse" --bbox --seg
# Output to custom paths
sam-track track video.mp4 --text "fly" \
--bbox-output fly_tracks.json \
--seg-output fly_masks.h5
ROI Prompts (--roi)
Track from polygon regions defined in a labelroi YAML file. Polygons are converted to binary masks for SAM3.
# Track from ROI annotations
sam-track track video.mp4 --roi rois.yml --bbox
# Output both formats
sam-track track video.mp4 --roi rois.yml --bbox --seg
ROI YAML format:
video: video.mp4
frame_idx: 0
resolution: [1920, 1080]
rois:
- id: 0
name: mouse1
polygon: [[100, 200], [150, 200], [150, 250], [100, 250]]
- id: 1
name: mouse2
polygon: [[300, 400], [350, 400], [350, 450], [300, 450]]
Pose Prompts (--pose)
Track from SLEAP pose annotations. Keypoints from labeled frames are used as point prompts for SAM3.
# Track from poses, output tracked SLP
sam-track track video.mp4 --pose labels.slp --slp
# Output all formats
sam-track track video.mp4 --pose labels.slp --bbox --seg --slp
# Exclude body parts from matching
sam-track track video.mp4 --pose labels.slp --slp \
--exclude-nodes "tail_tip,left_ear,right_ear"
# Only keep poses that matched a SAM3 mask
sam-track track video.mp4 --pose labels.slp --slp --remove-unmatched
# Only output masks/boxes that matched a pose
sam-track track video.mp4 --pose labels.slp --bbox --seg --filter-by-pose
Pose mode features:
- Uses keypoints as point prompts (visible keypoints only)
- Matches poses to SAM3 masks using Hungarian algorithm
- Propagates GT track names (e.g., "mouse1") to all outputs
- Supports multi-frame labeled SLPs (uses nearest GT frame for matching)
- Preserves PredictedInstance types and confidence scores
Output Formats
Bounding Boxes (--bbox)
JSON format with track metadata, per-frame detections, and statistics.
Default path: <video>.bbox.json
{
"metadata": {
"version": "1.0",
"video_source": "video.mp4",
"width": 1920,
"height": 1080,
"fps": 30.0,
"total_frames": 1000,
"tracking_model": "facebook/sam3",
"prompt_type": "text",
"prompt_info": {"text": "mouse"},
"created_at": "2025-12-21T12:00:00"
},
"tracks": [
{
"track_id": 0,
"name": "mouse1",
"first_frame": 0,
"last_frame": 999,
"avg_confidence": 0.95,
"detections": [
{
"frame_idx": 0,
"x_min": 100.0,
"y_min": 200.0,
"x_max": 300.0,
"y_max": 400.0,
"confidence": 0.98,
"width": 200.0,
"height": 200.0,
"area": 40000.0
}
]
}
],
"statistics": {
"total_tracks": 2,
"total_detections": 1998,
"frames_with_detections": 1000,
"avg_confidence": 0.94
}
}
Segmentation Masks (--seg)
HDF5 format with compressed binary masks and per-track metadata.
Default path: <video>.seg.h5
/masks - uint8 (T, N, H, W) binary masks, GZIP compressed
/frame_indices - int32 (T,) frame indices
/track_ids - int32 (T, N) track ID per mask
/confidences - float32 (T, N) detection confidence
/num_objects - int32 (T,) objects per frame
/metadata/
version - "1.0"
video_source - "video.mp4"
width, height - frame dimensions
fps - video frame rate
total_frames - frames processed
compression - "gzip"
compression_level - 1
/tracks/
track_0/
name - "mouse1"
first_frame - 0
last_frame - 999
avg_confidence - 0.95
track_1/
...
Reading masks in Python:
import h5py
with h5py.File("video.seg.h5", "r") as f:
masks = f["masks"][:] # (T, N, H, W) uint8
frame_indices = f["frame_indices"][:]
track_ids = f["track_ids"][:]
# Get mask for frame 100, track 0
frame_mask = masks[100, 0] # (H, W) binary mask
Tracked Poses (--slp)
SLEAP SLP format with SAM3-assigned track identities. Only available with --pose.
Default path: <pose>.sam-tracked.slp
The output SLP contains:
- All instances from the input with SAM3-assigned tracks
- Track names propagated from GT labels (e.g., "mouse1", "mouse2")
tracking_scorefield with pose-mask matching confidence- Preserved instance types (Instance vs PredictedInstance)
Loading in Python:
import sleap_io as sio
labels = sio.load_slp("labels.sam-tracked.slp")
for lf in labels:
for inst in lf.instances:
print(f"Frame {lf.frame_idx}: {inst.track.name}")
CLI Reference
Main Command
sam-track track VIDEO [OPTIONS]
Prompt Options (exactly one required)
| Option | Description |
|---|---|
--text, -t |
Text description of object to track |
--roi, -r |
Path to labelroi YAML file |
--pose, -p |
Path to SLEAP SLP file |
Output Options (at least one required)
| Option | Description |
|---|---|
--bbox, -b |
Enable bounding box output |
--bbox-output, -B |
Custom bbox output path (implies --bbox) |
--seg, -s |
Enable segmentation mask output |
--seg-output, -S |
Custom seg output path (implies --seg) |
--slp |
Output path for tracked SLP (pose mode only) |
Pose Mode Options
| Option | Description |
|---|---|
--remove-unmatched |
Remove poses without SAM3 mask matches |
--exclude-nodes |
Comma-separated nodes to exclude from matching |
--filter-by-pose |
Only output masks/boxes that matched a pose |
Processing Options
| Option | Description |
|---|---|
--device, -d |
Device for inference (cuda, cuda:0, mps, cpu) |
--start-frame |
Frame index to start from (0-indexed, default: 0) |
--stop-frame |
Frame index to stop at (exclusive) |
--max-frames, -n |
Maximum frames to process from start |
--preload |
Load all frames upfront (uses more memory) |
--quiet, -q |
Suppress progress output |
Other Commands
sam-track auth [--token TOKEN] # Check/set HuggingFace auth
sam-track system # Display GPU/system info
sam-track --version # Show version
Examples
Track mice in a behavioral video
# Simple text tracking
sam-track track experiment.mp4 --text "mouse" --bbox --seg
# Process only frames 1000-2000
sam-track track experiment.mp4 --text "mouse" --bbox \
--start-frame 1000 --stop-frame 2000
# Process 500 frames starting from frame 1000
sam-track track experiment.mp4 --text "mouse" --bbox \
--start-frame 1000 --max-frames 500
Track from SLEAP predictions
# Add track identities to untracked predictions
sam-track track video.mp4 --pose predictions.slp --slp
# Get all outputs with consistent track names
sam-track track video.mp4 --pose predictions.slp \
--bbox --seg --slp
# Exclude tail from matching (often occluded)
sam-track track video.mp4 --pose predictions.slp --slp \
--exclude-nodes "tail_tip,tail_mid"
Use specific GPU
# Use second GPU
sam-track track video.mp4 --text "fly" --bbox --device cuda:1
# Force CPU (slow but works without GPU)
sam-track track video.mp4 --text "fly" --bbox --device cpu
Troubleshooting
CUDA out of memory
Try these in order:
- Use streaming mode (default) - don't use
--preload - Process fewer frames:
--max-frames 100 - Use a smaller portion:
--start-frame 0 --stop-frame 500 - Close other GPU applications
Authentication errors
# Check current status
sam-track auth
# Re-login with new token
sam-track auth --token hf_xxxxx
Driver too old
SAM3 requires CUDA 13.0. Check your driver version:
sam-track system
nvidia-smi
Minimum drivers: Linux 580.65.06, Windows 580.65
Contributing
See CONTRIBUTING.md for development setup and guidelines.
License
BSD-3-Clause
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sam_track-0.1.1.tar.gz.
File metadata
- Download URL: sam_track-0.1.1.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f24ae2b945bb8589582a3dd49a8b5d3b2dabe1ed2e38f7a9fefbd029d3ba0476
|
|
| MD5 |
5e3e7936395e45945199d4490f96f05e
|
|
| BLAKE2b-256 |
737209bb63d79c22578ac07f34d10a6e7200a7b2356b93b42e7f3beef6747123
|
Provenance
The following attestation bundles were made for sam_track-0.1.1.tar.gz:
Publisher:
publish.yml on talmolab/sam-track
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sam_track-0.1.1.tar.gz -
Subject digest:
f24ae2b945bb8589582a3dd49a8b5d3b2dabe1ed2e38f7a9fefbd029d3ba0476 - Sigstore transparency entry: 779220548
- Sigstore integration time:
-
Permalink:
talmolab/sam-track@140bcb76d9c2638f1d8f3a408db31d08bd739980 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/talmolab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@140bcb76d9c2638f1d8f3a408db31d08bd739980 -
Trigger Event:
release
-
Statement type:
File details
Details for the file sam_track-0.1.1-py3-none-any.whl.
File metadata
- Download URL: sam_track-0.1.1-py3-none-any.whl
- Upload date:
- Size: 48.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
238f74819498a624e61c07fc12c6cac5c58863f6782aab40dfe53449b216190a
|
|
| MD5 |
55942285ac41abb4c3fba113cd9a03a1
|
|
| BLAKE2b-256 |
af4a8e506831f25bb25b868bdd36e2b28e9f26c29684c9b6c4fd4ab5e55b7355
|
Provenance
The following attestation bundles were made for sam_track-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on talmolab/sam-track
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sam_track-0.1.1-py3-none-any.whl -
Subject digest:
238f74819498a624e61c07fc12c6cac5c58863f6782aab40dfe53449b216190a - Sigstore transparency entry: 779220553
- Sigstore integration time:
-
Permalink:
talmolab/sam-track@140bcb76d9c2638f1d8f3a408db31d08bd739980 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/talmolab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@140bcb76d9c2638f1d8f3a408db31d08bd739980 -
Trigger Event:
release
-
Statement type: