Skip to main content

A modular video object detection toolkit with a clean det-v1 JSON schema, pluggable backends, and optional model export.

Project description

detect

A modular video detection toolkit that produces a stable det-v1 JSON output schema, with a pluggable backend (currently Ultralytics) and optional model export.

  • Backend: Ultralytics (YOLO families, RT-DETR, YOLO-World/YOLOE, SAM/FastSAM — depending on your installed ultralytics version)
  • Default behavior: no files are written unless you opt-in (JSON / frames / annotated video)

Output schema (det-v1)

Every run returns a det-v1 payload in memory (and the CLI prints it to stdout).

Top-level keys:

  • schema_version: always "det-v1"
  • video: {path, fps, frame_count, width, height}
  • detector: configuration used for the run (name/weights/conf/imgsz/device/half + task + optional prompts/topk)
  • frames: list of per-frame records

Per-frame record:

  • frame: 0-based frame index
  • file: standard frame filename (e.g. 000000.jpg) (even if frames aren’t saved)
  • detections: list of detections

Detection fields:

  • boxes: bbox = [x1, y1, x2, y2]
  • pose: keypoints = [[x, y, score], ...]
  • segmentation: segments = [[[x, y], ...], ...] (polygons)
  • oriented boxes (best-effort): obb = [cx, cy, w, h, angle_degrees] plus an axis-aligned bbox

Minimal example

{
  "schema_version": "det-v1",
  "video": {"path": "in.mp4", "fps": 30.0, "frame_count": 120, "width": 1920, "height": 1080},
  "detector": {"name": "ultralytics", "weights": "yolo26n", "conf_thresh": 0.25, "imgsz": 640, "device": "cpu", "half": false, "task": "detect"},
  "frames": [
    {
      "frame": 0,
      "file": "000000.jpg",
      "detections": [
        {"det_ind": 0, "bbox": [100.0, 50.0, 320.0, 240.0], "score": 0.91, "class_id": 0, "class_name": "person"}
      ]
    }
  ]
}

Install

Requires Python 3.11+.

From PyPI

pip install detect-lib

Optional extras (only if you need them):

pip install "detect-lib[export]"      # ONNX / export helpers
pip install "detect-lib[coreml]"      # CoreML export (macOS)
pip install "detect-lib[openvino]"    # OpenVINO export
pip install "detect-lib[tf]"          # TensorFlow export paths (heavy)

From GitHub (uv)

git clone https://github.com/Surya-Rayala/VisionPipeline-detection.git
cd VisionPipeline-detection
uv sync

Extras:

uv sync --extra export
uv sync --extra coreml
uv sync --extra openvino
uv sync --extra tf

CLI

All CLI commands are:

  • python -m ... (pip)
  • uv run python -m ... (uv)

Detection

Help:

python -m detect.cli.detect_video -h

List models (registry + installed):

python -m detect.cli.detect_video --list-models

Common patterns

1) Bounding boxes (typical YOLO / RT-DETR)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolo26n \
  --task detect \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yolo26n_detect

2) Instance segmentation (polygons)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolo26n-seg \
  --task segment \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yolo26n_seg

3) Pose (keypoints)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolo26n-pose \
  --task pose \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yolo26n_pose

4) Open-vocabulary (YOLO-World / YOLOE)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolov8s-worldv2 \
  --task openvocab \
  --text "person,car,dog" \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name worldv2_openvocab

*Open-vocabulary + polygons (YOLOE -seg)

Use a YOLOE segmentation weight and segment when you want polygons.

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yoloe-11s-seg \
  --task segment \
  --text "person,car,dog" \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yoloe_seg_openvocab

Task semantics (important)

  • detect | segment | pose | obb | classify | sam | sam2 | sam3 | fastsam describe the output type you want.
  • openvocab is a prompt mode for YOLO-World/YOLOE. Output type follows the model (boxes vs masks). If you want polygons, use a *-seg model and segment.

Prompts

You can supply prompts via:

  • --text "a,b,c" (open-vocabulary label list)
  • --box "x1,y1,x2,y2" (repeatable)
  • --point "x,y" or --point "x,y,label" (repeatable; label 1=fg, 0=bg)
  • --prompts prompts.json (combined)

Example prompts.json:

{
  "text": ["person", "car", "dog"],
  "boxes": [[100, 100, 500, 500]],
  "points": [[320, 240, 1], [100, 120, 0]],
  "topk": 5
}

Export note (open-vocab): exported formats (ONNX/CoreML/etc.) may not support changing the vocabulary at runtime. If prompts don’t take effect, run the .pt weights for true open-vocabulary prompting or post-filter detections.

Artifacts (all opt-in)

  • --json writes out/<run-name>/detections.json
  • --frames writes out/<run-name>/frames/*.jpg
  • --save-video NAME.mp4 writes out/<run-name>/NAME.mp4

If you don’t enable any artifacts, no output directory is created.


Python API

Parameter mapping (Python vs CLI)

Python uses snake_case keyword arguments. The CLI uses kebab-case flags. The values are the same, but the names differ.

Common mapping:

  • CLI --video → Python video
  • CLI --detector → Python detector
  • CLI --weights → Python weights
  • CLI --classes "0,2" → Python classes=[0, 2]
  • CLI --conf-thresh → Python conf_thresh
  • CLI --imgsz → Python imgsz
  • CLI --device → Python device
  • CLI --half → Python half=True
  • CLI --task → Python task

Prompts:

  • CLI --text "a,b" → Python prompts={"text": ["a", "b"]}
  • CLI --box "x1,y1,x2,y2" (repeatable) → Python prompts={"boxes": [[x1, y1, x2, y2], ...]}
  • CLI --point "x,y,label" (repeatable) → Python prompts={"points": [[x, y, label], ...]}
  • CLI --topk N → Python topk=N (or prompts={"topk": N})

Artifacts (all opt-in):

  • CLI --json → Python save_json=True
  • CLI --frames → Python save_frames=True
  • CLI --save-video NAME.mp4 → Python save_video="NAME.mp4"
  • CLI --out-dir DIR → Python out_dir="DIR"
  • CLI --run-name NAME → Python run_name="NAME"
  • CLI --no-progress → Python progress=False
  • CLI --display → Python display=True

Note: the Python API also accepts an advanced artifacts=ArtifactOptions(...) object, but the convenience args above are easiest for most usage.

Detect a video

from detect import detect_video

res = detect_video(
    video="in.mp4",
    detector="ultralytics",
    weights="yolo26n",
    task="detect",
    classes=None,          # e.g. [0, 2] to filter class ids
    conf_thresh=0.25,
    imgsz=640,
    device="auto",
    half=False,
    # prompts={"text": ["person", "car", "dog"]},  # for open-vocabulary models
    save_json=True,
    save_video="annotated.mp4",
    out_dir="out",
    run_name="py_detect",
)

print(res.payload["schema_version"], len(res.payload["frames"]))
print(res.paths)

Note: legacy detector aliases (yolo_bbox, yolo_seg, yolo_pose) are still accepted for backward compatibility, but the docs use ultralytics everywhere.


Export

Export is currently implemented for the Ultralytics backend.

CLI export

python -m detect.cli.export_model -h

python -m detect.cli.export_model \
  --weights yolo26n \
  --formats onnx \
  --out-dir models/exports --run-name y26_onnx

Export from Python

Python export also uses snake_case args (e.g., out_dir, run_name) and accepts formats as a list or comma-separated string.

from detect.backends.ultralytics.export import export_model_ultralytics

res = export_model_ultralytics(
    weights="yolo26n",
    formats=["onnx"],
    imgsz=640,
    out_dir="models/exports",
    run_name="y26_onnx_py",
)

print("run_dir:", res["run_dir"])
for p in res["artifacts"]:
    print("-", p)

Compatibility notes:

  • Some model families do not support export (e.g., MobileSAM and SAM/SAM2/SAM3 per Ultralytics docs). The export CLI will warn and exit.
  • YOLO-World v1 weights (*-world.pt) do not support export; use YOLO-World v2 (*-worldv2.pt) for export.
  • YOLOv10 supports export but only to a restricted set of formats; unsupported formats will warn and exit.

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detect_lib-0.1.8.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detect_lib-0.1.8-py3-none-any.whl (39.1 kB view details)

Uploaded Python 3

File details

Details for the file detect_lib-0.1.8.tar.gz.

File metadata

  • Download URL: detect_lib-0.1.8.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for detect_lib-0.1.8.tar.gz
Algorithm Hash digest
SHA256 c58e14c035c71188497280ae1ee1235e1fba4ca49395a0451c34501f9bd3a7b0
MD5 0a5798a3a184e67d82fdc69b8abeaba7
BLAKE2b-256 4ec9fd5c8c41601e56cfecf93135cde0f4510fb36f53adf43e108dd198a7580a

See more details on using hashes here.

File details

Details for the file detect_lib-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: detect_lib-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 39.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for detect_lib-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 bfe37583100128f935598261fccf2a0e1d49027a4d4a2c00516b4aa7fe8e92fc
MD5 acfec8d9b56df3eedaa55fb921f8d003
BLAKE2b-256 147ca199f2d862ba6f5d671e1e0160baa0a68fee74f857ebee7232c6353b7707

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page