Track objects in videos using SAM3 (Segment Anything Model 3).

These details have not been verified by PyPI

Project description

sam-track

A uv-native CLI for tracking objects in videos using SAM3 (Segment Anything Model 3).

Features

Three prompt modes: Track by text description, ROI polygons, or SLEAP pose keypoints
Three output formats: Bounding boxes (JSON), segmentation masks (HDF5), tracked poses (SLP)
Memory-efficient: Streaming mode processes videos frame-by-frame
SLEAP integration: Link untracked pose predictions to consistent identities

Installation

Requires Python 3.12+ and uv.

As a uv tool (recommended)

# Linux/Windows with NVIDIA GPU (CUDA 13.0)
uv tool install sam-track --prerelease=allow --with "tokenizers==0.22.1" --index https://download.pytorch.org/whl/cu130 --index-strategy unsafe-best-match

# macOS with Apple Silicon (MPS)
uv tool install sam-track --prerelease=allow --with "tokenizers==0.22.1"

After installation, sam-track is available globally.

Ad-hoc with uvx

# Linux/Windows with NVIDIA GPU
uvx --prerelease=allow --with "tokenizers==0.22.1" --index https://download.pytorch.org/whl/cu130 --index-strategy unsafe-best-match sam-track --help

# macOS with Apple Silicon
uvx --prerelease=allow --with "tokenizers==0.22.1" sam-track --help

From source

git clone https://github.com/talmolab/sam-track && cd sam-track
uv sync
uv run sam-track --help

GPU Requirements

Platform	Requirement
Linux	NVIDIA driver 580.65.06+ (CUDA 13.0)
Windows	NVIDIA driver 580.65+ (CUDA 13.0)
macOS	Apple Silicon (MPS, no driver needed)

Check your setup:

sam-track system

First-time Setup

SAM3 is a gated model requiring HuggingFace authentication.

1. Check status:

sam-track auth

2. If not authenticated, create a token:

Go to https://huggingface.co/settings/tokens
Click Create new token
Name it sam-track, select Read permission
Login:

sam-track auth --token hf_...

3. If no model access, request it:

Go to https://huggingface.co/facebook/sam3
Fill out the access request form
Run sam-track auth again to verify

Quick Start

# Track a mouse by text description, output bounding boxes
sam-track track video.mp4 --text "mouse" --bbox

# Track from ROI annotations, output masks
sam-track track video.mp4 --roi annotations.yml --seg

# Track from SLEAP poses, output tracked SLP
sam-track track video.mp4 --pose labels.slp --slp

Prompting Modes

sam-track supports three ways to specify what to track:

Text Prompts (`--text`)

Track objects by natural language description. SAM3 detects matching objects in the first frame and tracks them through the video.

# Track a single object type
sam-track track video.mp4 --text "mouse" --bbox

# Track with description
sam-track track video.mp4 --text "black mouse" --bbox --seg

# Output to custom paths
sam-track track video.mp4 --text "fly" \
  --bbox-output fly_tracks.json \
  --seg-output fly_masks.h5

ROI Prompts (`--roi`)

Track from polygon regions defined in a labelroi YAML file. Polygons are converted to binary masks for SAM3.

# Track from ROI annotations
sam-track track video.mp4 --roi rois.yml --bbox

# Output both formats
sam-track track video.mp4 --roi rois.yml --bbox --seg

ROI YAML format:

video: video.mp4
frame_idx: 0
resolution: [1920, 1080]
rois:
  - id: 0
    name: mouse1
    polygon: [[100, 200], [150, 200], [150, 250], [100, 250]]
  - id: 1
    name: mouse2
    polygon: [[300, 400], [350, 400], [350, 450], [300, 450]]

Pose Prompts (`--pose`)

Track from SLEAP pose annotations. Keypoints from labeled frames are used as point prompts for SAM3.

# Track from poses, output tracked SLP
sam-track track video.mp4 --pose labels.slp --slp

# Output all formats
sam-track track video.mp4 --pose labels.slp --bbox --seg --slp

# Exclude body parts from matching
sam-track track video.mp4 --pose labels.slp --slp \
  --exclude-nodes "tail_tip,left_ear,right_ear"

# Only keep poses that matched a SAM3 mask
sam-track track video.mp4 --pose labels.slp --slp --remove-unmatched

# Only output masks/boxes that matched a pose
sam-track track video.mp4 --pose labels.slp --bbox --seg --filter-by-pose

Pose mode features:

Uses keypoints as point prompts (visible keypoints only)
Matches poses to SAM3 masks using Hungarian algorithm
Propagates GT track names (e.g., "mouse1") to all outputs
Supports multi-frame labeled SLPs (uses nearest GT frame for matching)
Preserves PredictedInstance types and confidence scores

Output Formats

Bounding Boxes (`--bbox`)

JSON format with track metadata, per-frame detections, and statistics.

Default path: <video>.bbox.json

{
  "metadata": {
    "version": "1.0",
    "video_source": "video.mp4",
    "width": 1920,
    "height": 1080,
    "fps": 30.0,
    "total_frames": 1000,
    "tracking_model": "facebook/sam3",
    "prompt_type": "text",
    "prompt_info": {"text": "mouse"},
    "created_at": "2025-12-21T12:00:00"
  },
  "tracks": [
    {
      "track_id": 0,
      "name": "mouse1",
      "first_frame": 0,
      "last_frame": 999,
      "avg_confidence": 0.95,
      "detections": [
        {
          "frame_idx": 0,
          "x_min": 100.0,
          "y_min": 200.0,
          "x_max": 300.0,
          "y_max": 400.0,
          "confidence": 0.98,
          "width": 200.0,
          "height": 200.0,
          "area": 40000.0
        }
      ]
    }
  ],
  "statistics": {
    "total_tracks": 2,
    "total_detections": 1998,
    "frames_with_detections": 1000,
    "avg_confidence": 0.94
  }
}

Segmentation Masks (`--seg`)

HDF5 format with compressed binary masks and per-track metadata.

Default path: <video>.seg.h5

/masks              - uint8 (T, N, H, W) binary masks, GZIP compressed
/frame_indices      - int32 (T,) frame indices
/track_ids          - int32 (T, N) track ID per mask
/confidences        - float32 (T, N) detection confidence
/num_objects        - int32 (T,) objects per frame
/metadata/
  version           - "1.0"
  video_source      - "video.mp4"
  width, height     - frame dimensions
  fps               - video frame rate
  total_frames      - frames processed
  compression       - "gzip"
  compression_level - 1
/tracks/
  track_0/
    name            - "mouse1"
    first_frame     - 0
    last_frame      - 999
    avg_confidence  - 0.95
  track_1/
    ...

Reading masks in Python:

import h5py

with h5py.File("video.seg.h5", "r") as f:
    masks = f["masks"][:]          # (T, N, H, W) uint8
    frame_indices = f["frame_indices"][:]
    track_ids = f["track_ids"][:]

    # Get mask for frame 100, track 0
    frame_mask = masks[100, 0]     # (H, W) binary mask

Tracked Poses (`--slp`)

SLEAP SLP format with SAM3-assigned track identities. Only available with --pose.

Default path: <pose>.sam-tracked.slp

The output SLP contains:

All instances from the input with SAM3-assigned tracks
Track names propagated from GT labels (e.g., "mouse1", "mouse2")
tracking_score field with pose-mask matching confidence
Preserved instance types (Instance vs PredictedInstance)

Loading in Python:

import sleap_io as sio

labels = sio.load_slp("labels.sam-tracked.slp")
for lf in labels:
    for inst in lf.instances:
        print(f"Frame {lf.frame_idx}: {inst.track.name}")

CLI Reference

Main Command

sam-track track VIDEO [OPTIONS]

Prompt Options (exactly one required)

Option	Description
`--text`, `-t`	Text description of object to track
`--roi`, `-r`	Path to labelroi YAML file
`--pose`, `-p`	Path to SLEAP SLP file

Output Options (at least one required)

Option	Description
`--bbox`, `-b`	Enable bounding box output
`--bbox-output`, `-B`	Custom bbox output path (implies `--bbox`)
`--seg`, `-s`	Enable segmentation mask output
`--seg-output`, `-S`	Custom seg output path (implies `--seg`)
`--slp`	Output path for tracked SLP (pose mode only)

Pose Mode Options

Option	Description
`--remove-unmatched`	Remove poses without SAM3 mask matches
`--exclude-nodes`	Comma-separated nodes to exclude from matching
`--filter-by-pose`	Only output masks/boxes that matched a pose

Processing Options

Option	Description
`--device`, `-d`	Device for inference (cuda, cuda:0, mps, cpu)
`--start-frame`	Frame index to start from (0-indexed, default: 0)
`--stop-frame`	Frame index to stop at (exclusive)
`--max-frames`, `-n`	Maximum frames to process from start
`--preload`	Load all frames upfront (uses more memory)
`--quiet`, `-q`	Suppress progress output

Other Commands

sam-track auth [--token TOKEN]  # Check/set HuggingFace auth
sam-track system                # Display GPU/system info
sam-track --version             # Show version

Examples

Track mice in a behavioral video

# Simple text tracking
sam-track track experiment.mp4 --text "mouse" --bbox --seg

# Process only frames 1000-2000
sam-track track experiment.mp4 --text "mouse" --bbox \
  --start-frame 1000 --stop-frame 2000

# Process 500 frames starting from frame 1000
sam-track track experiment.mp4 --text "mouse" --bbox \
  --start-frame 1000 --max-frames 500

Track from SLEAP predictions

# Add track identities to untracked predictions
sam-track track video.mp4 --pose predictions.slp --slp

# Get all outputs with consistent track names
sam-track track video.mp4 --pose predictions.slp \
  --bbox --seg --slp

# Exclude tail from matching (often occluded)
sam-track track video.mp4 --pose predictions.slp --slp \
  --exclude-nodes "tail_tip,tail_mid"

Use specific GPU

# Use second GPU
sam-track track video.mp4 --text "fly" --bbox --device cuda:1

# Force CPU (slow but works without GPU)
sam-track track video.mp4 --text "fly" --bbox --device cpu

Troubleshooting

CUDA out of memory

Try these in order:

Use streaming mode (default) - don't use --preload
Process fewer frames: --max-frames 100
Use a smaller portion: --start-frame 0 --stop-frame 500
Close other GPU applications

Authentication errors

# Check current status
sam-track auth

# Re-login with new token
sam-track auth --token hf_xxxxx

Driver too old

SAM3 requires CUDA 13.0. Check your driver version:

sam-track system
nvidia-smi

Minimum drivers: Linux 580.65.06, Windows 580.65

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

BSD-3-Clause

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Dec 24, 2025

0.1.0

Dec 22, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sam_track-0.1.1.tar.gz (3.2 MB view details)

Uploaded Dec 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sam_track-0.1.1-py3-none-any.whl (48.3 kB view details)

Uploaded Dec 24, 2025 Python 3

File details

Details for the file sam_track-0.1.1.tar.gz.

File metadata

Download URL: sam_track-0.1.1.tar.gz
Upload date: Dec 24, 2025
Size: 3.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sam_track-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`f24ae2b945bb8589582a3dd49a8b5d3b2dabe1ed2e38f7a9fefbd029d3ba0476`
MD5	`5e3e7936395e45945199d4490f96f05e`
BLAKE2b-256	`737209bb63d79c22578ac07f34d10a6e7200a7b2356b93b42e7f3beef6747123`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sam_track-0.1.1.tar.gz:

Publisher: publish.yml on talmolab/sam-track

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sam_track-0.1.1.tar.gz
- Subject digest: f24ae2b945bb8589582a3dd49a8b5d3b2dabe1ed2e38f7a9fefbd029d3ba0476
- Sigstore transparency entry: 779220548
- Sigstore integration time: Dec 24, 2025
Source repository:
- Permalink: talmolab/sam-track@140bcb76d9c2638f1d8f3a408db31d08bd739980
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/talmolab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@140bcb76d9c2638f1d8f3a408db31d08bd739980
- Trigger Event: release

File details

Details for the file sam_track-0.1.1-py3-none-any.whl.

File metadata

Download URL: sam_track-0.1.1-py3-none-any.whl
Upload date: Dec 24, 2025
Size: 48.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sam_track-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`238f74819498a624e61c07fc12c6cac5c58863f6782aab40dfe53449b216190a`
MD5	`55942285ac41abb4c3fba113cd9a03a1`
BLAKE2b-256	`af4a8e506831f25bb25b868bdd36e2b28e9f26c29684c9b6c4fd4ab5e55b7355`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sam_track-0.1.1-py3-none-any.whl:

Publisher: publish.yml on talmolab/sam-track

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sam_track-0.1.1-py3-none-any.whl
- Subject digest: 238f74819498a624e61c07fc12c6cac5c58863f6782aab40dfe53449b216190a
- Sigstore transparency entry: 779220553
- Sigstore integration time: Dec 24, 2025
Source repository:
- Permalink: talmolab/sam-track@140bcb76d9c2638f1d8f3a408db31d08bd739980
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/talmolab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@140bcb76d9c2638f1d8f3a408db31d08bd739980
- Trigger Event: release

sam-track 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

sam-track

Features

Installation

As a uv tool (recommended)

Ad-hoc with uvx

From source

GPU Requirements

First-time Setup

Quick Start

Prompting Modes

Text Prompts (--text)

ROI Prompts (--roi)

Pose Prompts (--pose)

Output Formats

Bounding Boxes (--bbox)

Segmentation Masks (--seg)

Tracked Poses (--slp)

CLI Reference

Main Command

Prompt Options (exactly one required)

Output Options (at least one required)

Pose Mode Options

Processing Options

Other Commands

Examples

Track mice in a behavioral video

Track from SLEAP predictions

Use specific GPU

Troubleshooting

CUDA out of memory

Authentication errors

Driver too old

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Text Prompts (`--text`)

ROI Prompts (`--roi`)

Pose Prompts (`--pose`)

Bounding Boxes (`--bbox`)

Segmentation Masks (`--seg`)

Tracked Poses (`--slp`)