yolo-frigate: DeepStack-shaped REST API for YOLO object detection with lazy runtime-native export

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

earthperson

These details have not been verified by PyPI

Project description

yolo-frigate

Source: github.com/a-earthperson/yolo-frigate.

HTTP object detection for NVR-style stacks—most often Frigate or anything else that speaks a DeepStack-compatible multipart POST /detect API. The service wraps Ultralytics YOLO models, optionally lazy-exporting Ultralytics .pt checkpoints into each image’s native runtime (TensorRT, OpenVINO, TFLite, EdgeTPU, ONNX) on first use.

Typical deployments:

One GPU-class container per inference node (NVIDIA TensorRT or ONNX Runtime, Intel iGPU via OpenVINO, Coral EdgeTPU via TFLite, and so on).
Shared read-only model trees and writable export caches (bind mounts, named volumes, or NFS when workers are spread across machines).
Orchestrator placement so NVIDIA-class images (TensorRT, ONNX GPU) land on GPU hosts and OpenVINO on Intel-GPU hosts—Compose on a single box, Swarm/Kubernetes when models and cache live on shared storage.

Runtime images

Each Docker image ships with a fixed YOLO_FRIGATE_RUNTIME and the dependencies for that stack only:

Variant	Dockerfile	`YOLO_FRIGATE_RUNTIME`	Native artifacts	Typical devices
NVIDIA TensorRT	`nvidia-trt.Dockerfile`	`tensorrt`	`.engine`	NVIDIA: `gpu`, `gpu:0`, …
NVIDIA ONNX Runtime (GPU)	`onnx-gpu.Dockerfile`	`onnx`	`.onnx`	`cpu` (CPU EP); NVIDIA: `gpu`, `gpu:0`, …
Intel GPU (OpenVINO)	`intel-gpu.Dockerfile`	`openvino`	`*_openvino_model/`	CPU; Intel GPU: `gpu`, `gpu:0`, …; NPU where supported
Coral EdgeTPU (TFLite)	`coral-tpu.Dockerfile`	`tflite`	`.tflite`	`cpu`; Coral: `usb`, `pci`, …

Release builds publish four images from .github/workflows/publish.yml. The image name is the GitHub owner/repository name plus a variant suffix that matches the Dockerfile stem (GHCR normalizes names to lowercase):

ghcr.io/a-earthperson/yolo-frigate-nvidia-trt — from nvidia-trt.Dockerfile
ghcr.io/a-earthperson/yolo-frigate-onnx-gpu — from onnx-gpu.Dockerfile
ghcr.io/a-earthperson/yolo-frigate-intel-gpu — from intel-gpu.Dockerfile
ghcr.io/a-earthperson/yolo-frigate-coral-tpu — from coral-tpu.Dockerfile

Forks and private mirrors use their own owner/repo in place of a-earthperson/yolo-frigate. Version tags (for example v0.1.8, latest) come from the GitHub release via docker/metadata-action. You may retag or mirror under other names if your registry layout requires it.

API contract

Shared across all runtime profiles:

POST /detect — multipart form-data with an image file field.
Success responses follow the Predictions schema in yolo_frigate.prediction.
GET /health — liveness (empty body).
POST /force_save/{state} — toggles forced save behavior for debugging.

Runtime selection and devices

Inside an image, runtime is determined by YOLO_FRIGATE_RUNTIME (set in the variant Dockerfile). Legacy images may still use YOLOREST_RUNTIME; the application reads both. You normally pass a .pt checkpoint to --model_file and let that image export lazily.

Overrides when running outside Docker:

uv run yolo-frigate --runtime=tensorrt --device=gpu:0 --model_file=/models/model.pt
uv run yolo-frigate --runtime=onnx --device=cpu --model_file=/models/model.pt
uv run yolo-frigate --runtime=onnx --device=gpu:0 --model_file=/models/model.pt
uv run yolo-frigate --runtime=openvino --device=cpu --model_file=/models/model.pt
uv run yolo-frigate --runtime=tflite --device=cpu --model_file=/models/model.pt
uv run yolo-frigate --runtime=edgetpu --device=pci --model_file=/models/model.pt

The installable project is yolo-frigate; the Python import package is yolo_frigate (for example python -m yolo_frigate).

Pre-exported artifacts can be passed directly to --model_file: TensorRT .engine, ONNX .onnx, OpenVINO *_openvino_model/, TFLite / EdgeTPU .tflite.

Device strings are runtime-specific (TensorRT: gpu, gpu:0, …; ONNX: cpu, gpu, gpu:0, …; OpenVINO: cpu, gpu, npu, …; TFLite: cpu; EdgeTPU: usb, pci, …). --runtime is the only runtime selector; --label_file overrides embedded class names when needed.

Lazy export and cache

When --model_file is a .pt checkpoint, export happens on first inference. Cached outputs live under YOLO_FRIGATE_MODEL_CACHE_DIR (default /cache/yolo-frigate in images; legacy YOLOREST_MODEL_CACHE_DIR and /cache/yolorest are still honored). The cache key includes profile, --export_imgsz, --export_half, --export_int8, --export_dynamic, --export_nms, --export_batch, --export_workspace, and calibration inputs.

Operational notes:

Mount a writable /cache (or custom YOLO_FRIGATE_MODEL_CACHE_DIR) whenever the container filesystem is read-only.
--export_int8 requires --export_data for deterministic calibration.
TensorRT INT8 caches are GPU-generation sensitive; export on hardware representative of production.
EdgeTPU export expects an x86 Linux exporter environment.

Common export flags:

--export_imgsz=640
--export_half
--export_dynamic
--export_batch=1
--export_workspace=4
--export_int8 --export_data=/models/data.yaml

Deployment patterns

What every container needs

/models — readable tree with checkpoints or pre-exported artifacts and optional labelmap.txt (or pass --label_file).
/cache — writable export cache (omit only if you never lazy-export or you redirect YOLO_FRIGATE_MODEL_CACHE_DIR elsewhere).
Devices — NVIDIA Container Toolkit + GPU reservation for TensorRT and ONNX GPU; /dev/dri (and often video/render groups) for Intel GPU OpenVINO; Coral device nodes for EdgeTPU.
Shared memory — large tmpfs on /dev/shm can help some workloads when memory pressure appears during export or batching.

Docker Compose (single host)

Minimal TensorRT service (image name ends with -nvidia-trt for releases from this repo):

services:
  yolo-frigate-tensorrt:
    image: ghcr.io/a-earthperson/yolo-frigate-nvidia-trt:v1.2.3
    restart: on-failure
    read_only: true
    security_opt:
      - "no-new-privileges=true"
    volumes:
      - ./models:/models:ro
      - ./cache:/cache
    command:
      - "--device=gpu:0"
      - "--label_file=/models/labelmap.txt"
      - "--model_file=/models/yolo11n.pt"
      - "--export_imgsz=640"
      - "--export_half"
      - "--export_dynamic"
    gpus: all

The ONNX GPU image (yolo-frigate-onnx-gpu) uses the same NVIDIA + gpus pattern; runtime is fixed to onnx in the Dockerfile.

For Swarm, use GPU reservations on deploy.resources instead of gpus: all (Compose ignores deploy on non-Swarm setups).

Intel GPU (OpenVINO image; image name ends with -intel-gpu):

services:
  yolo-frigate-intel-gpu:
    image: ghcr.io/a-earthperson/yolo-frigate-intel-gpu:v1.2.3
    restart: on-failure
    read_only: true
    devices:
      - /dev/dri:/dev/dri
    group_add:
      - "${GROUP_RENDER}"
    volumes:
      - ./models:/models:ro
      - ./cache:/cache
    command:
      - "--device=gpu"
      - "--label_file=/models/labelmap.txt"
      - "--model_file=/models/yolo11n.pt"
      - "--export_imgsz=320"
      - "--export_half"
      - "--export_dynamic"

Coral EdgeTPU (image name ends with -coral-tpu):

networks:
  yolo-frigate:
    driver: bridge
    internal: true

services:
  yolo-frigate:
    image: ghcr.io/a-earthperson/yolo-frigate-coral-tpu:v1.2.3
    restart: on-failure
    read_only: true
    group_add:
      - "${GROUP_CORAL}"
    security_opt:
      - "no-new-privileges=true"
    networks:
      - yolo-frigate
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - ./models:/models:ro
      - ./cache:/cache
    devices:
      - /dev/apex_0:/dev/apex_0
    command:
      - "--device=pci"
      - "--model_file=/models/yolo11n.pt"
      - "--export_imgsz=320"

Docker Swarm (multi-node, shared storage)

Swarm is useful when Frigate and workers sit on different nodes or when models and export cache should live on NFS (or another shared filesystem) so any worker can serve the same tree.

Pattern:

Overlay network attached to Frigate and detector services so http://<service>:8000/detect resolves cluster-wide.
Named volumes backed by NFS for /models and /cache so exports done on one node are visible to others (same cache key → same artifact).
Placement constraints so NVIDIA GPU images (TensorRT, ONNX GPU) run only on GPU-labeled nodes and OpenVINO on Intel-GPU-labeled nodes.
Optional cap_add: [CAP_PERFMON] when you want perf counters on Intel stacks; NVIDIA stacks often set NVIDIA_VISIBLE_DEVICES / NVIDIA_DRIVER_CAPABILITIES for full GPU feature exposure inside the container.

Illustrative stack.yml (replace NFS address, paths, images, and labels with yours):

networks:
  cams:
    external: true

volumes:
  models:
    driver: local
    driver_opts:
      type: nfs
      o: addr=192.168.1.1,rw,nfsvers=4,async
      device: ":/path/on/nfs/to/yolo/models"
  cache:
    driver: local
    driver_opts:
      type: nfs
      o: addr=192.168.1.1,rw,nfsvers=4,async
      device: ":/path/on/nfs/to/yolo/cache"

x-base: &x-base
  cap_add:
    - CAP_PERFMON
  networks:
    - cams
  volumes:
    - /dev/dri:/dev/dri
    - /etc/localtime:/etc/localtime:ro
    - /etc/timezone:/etc/timezone:ro
    - models:/models
    - cache:/cache
    - type: tmpfs
      target: /dev/shm
      tmpfs:
        size: 1073741824

services:
  objectdetector:
    <<: *x-base
    image: ghcr.io/a-earthperson/yolo-frigate-nvidia-trt:0.1.8
    environment:
      NVIDIA_VISIBLE_DEVICES: all
      NVIDIA_DRIVER_CAPABILITIES: all
    command:
      - "--device=gpu"
      - "--export_imgsz=640"
      - "--export_dynamic"
      - "--label_file=/models/labelmap.txt"
      - "--model_file=/models/yolo26l.pt"
      - "--export_half"
    deploy:
      placement:
        constraints:
          - node.labels.nvidia_gpu_compute == true
      resources:
        limits:
          memory: 4G
        reservations:
          memory: 1200M

  yolov9-intel:
    <<: *x-base
    image: ghcr.io/a-earthperson/yolo-frigate-intel-gpu:0.1.7
    command:
      - "--device=gpu"
      - "--export_imgsz=320"
      - "--export_dynamic"
      - "--label_file=/models/labelmap.txt"
      - "--model_file=/models/yolo26s.pt"
      - "--export_half"
    deploy:
      mode: replicated
      replicas: 0
      placement:
        constraints:
          - node.labels.intel_gpu_compute == true
      resources:
        limits:
          memory: 4096M
        reservations:
          memory: 1280M

Keep one writable cache per logical deployment; concurrent writers on the same NFS path can corrupt cache entries—usually you run a single replica per cache volume or partition cache per replica.

Kubernetes and other orchestrators

The same container arguments and volume mounts apply: mount models and cache, pass GPU device plugins or /dev/dri as your platform requires, and expose port 8000 to clients that call /detect.

Frigate

Frigate’s HTTP detector expects DeepStack-style JSON; point it at this service’s /detect endpoint on the Docker network Frigate shares with the detector (service name resolves under Compose; under Swarm use the stack service name on the overlay).

detectors:
  http_detector:
    type: deepstack
    api_url: http://objectdetector:8000/detect
    api_timeout: 1.0

Tune api_timeout to your SLA: sub-second values work when the model is warm; allow more time for cold lazy export or slow hosts. Align Frigate’s model.labelmap_path (or Frigate’s label file) with the classes you serve—often the same labelmap.txt you pass to --label_file.

Runtime-native scope

The service intentionally does one thing: resolve a runtime profile, lazily export .pt sources when needed, load with Ultralytics, and adapt outputs to the existing Predictions REST contract. It is not a generic server for arbitrary ONNX graphs or custom postprocessing.

Manual export (optional)

Ultralytics model sources: e.g. ultralytics/assets.

Lazy export is the default. Prebuilding is useful for air-gapped or reproducible rollouts.

TensorRT

docker run -it --rm -v .:/models ultralytics/ultralytics:latest \
  yolo export model=/models/<name>.pt format=engine half=True dynamic=True

OpenVINO

docker run -it --rm -v .:/models ultralytics/ultralytics:latest \
  yolo export model=/models/<name>.pt format=openvino

ONNX

docker run -it --rm -v .:/models ultralytics/ultralytics:latest \
  yolo export model=/models/<name>.pt format=onnx

TFLite

docker run -it --rm -v .:/models ultralytics/ultralytics:latest-cpu \
  yolo export model=/models/<name>.pt format=tflite

EdgeTPU

docker run -it --rm -v .:/models ultralytics/ultralytics:latest-cpu \
  yolo export model=/models/<name>.pt format=edgetpu

You can pass resulting .engine, .onnx, *_openvino_model/, or .tflite paths directly to --model_file.

Development

Install uv. Sync the project (core dependencies plus the dev group for tests and formatters), then run the test suite:

uv sync --group dev
uv run python -m unittest discover -s tests -v

Optional inference extras (install only what you need locally):

uv sync --group dev --extra tflite
uv sync --group dev --extra tensorrt
uv sync --group dev --extra openvino
uv sync --group dev --extra onnx-gpu

Notes:

Runtime extras are primarily intended for Linux environments that mirror the Docker images.
Lazy export depends on ultralytics; TFLite export additionally needs TensorFlow, native OpenVINO export needs openvino, and the ONNX GPU image uses the onnx-gpu extra (onnxruntime-gpu).

Format and lint:

uv run ruff check src tests
uv run black src tests

The repo pins the default interpreter with .python-version; uv respects that for uv sync and uv run.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

earthperson

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.12

Apr 15, 2026

0.3.5

Apr 14, 2026

0.2.7

Apr 9, 2026

0.2.6

Apr 9, 2026

0.2.5

Apr 9, 2026

0.2.4

Apr 2, 2026

This version

0.2.2

Apr 2, 2026

0.2.0

Apr 2, 2026

0.1.9

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yolo_frigate-0.2.2.tar.gz (37.7 kB view details)

Uploaded Apr 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yolo_frigate-0.2.2-py3-none-any.whl (32.3 kB view details)

Uploaded Apr 2, 2026 Python 3

File details

Details for the file yolo_frigate-0.2.2.tar.gz.

File metadata

Download URL: yolo_frigate-0.2.2.tar.gz
Upload date: Apr 2, 2026
Size: 37.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yolo_frigate-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`2d5bb74ca6662460f72c799e34c2d1c867377367a0010f95ab2c7565173a5ff7`
MD5	`27c0b37101f44d714ee6abe9405d96c6`
BLAKE2b-256	`4d793ac0f2b4e72a3e1010aefcf5674dce9ef7d91256fbf8b4841bcc6cf42777`

See more details on using hashes here.

File details

Details for the file yolo_frigate-0.2.2-py3-none-any.whl.

File metadata

Download URL: yolo_frigate-0.2.2-py3-none-any.whl
Upload date: Apr 2, 2026
Size: 32.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.3 {"installer":{"name":"uv","version":"0.11.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for yolo_frigate-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4714aacbe73c9a0a315b3d4dacd78f714f6f05a3773deb6df8ee87b93ec99960`
MD5	`16acca4aca2acc70fffbd172aae58cd8`
BLAKE2b-256	`e78c5809b9bbc76b1fb9168d51f0c5b838d72492038441ac45e2fff4dabeba12`

See more details on using hashes here.

yolo-frigate 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

yolo-frigate

Runtime images

API contract

Runtime selection and devices

Lazy export and cache

Deployment patterns

What every container needs

Docker Compose (single host)

Docker Swarm (multi-node, shared storage)

Kubernetes and other orchestrators

Frigate

Runtime-native scope

Manual export (optional)

TensorRT

OpenVINO

ONNX

TFLite

EdgeTPU

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes