Skip to main content

A modular video object detection toolkit with a clean det-v1 JSON schema, pluggable backends, and optional model export.

Project description

detect

A modular video detection toolkit that produces a stable det-v1 JSON output schema, with a pluggable backend (currently Ultralytics) and optional model export.

  • Backend: Ultralytics (YOLO families, RT-DETR, YOLO-World/YOLOE, SAM/FastSAM — depending on your installed ultralytics version)
  • Default behavior: no files are written unless you opt-in (JSON / frames / annotated video)

Output schema (det-v1)

Every run returns a det-v1 payload in memory (and the CLI prints it to stdout).

Top-level keys:

  • schema_version: always "det-v1"
  • video: {path, fps, frame_count, width, height}
  • detector: configuration used for the run (name/weights/conf/imgsz/device/half + task + optional prompts/topk)
  • frames: list of per-frame records

Per-frame record:

  • frame: 0-based frame index
  • file: standard frame filename (e.g. 000000.jpg) (even if frames aren’t saved)
  • detections: list of detections

Detection fields:

  • boxes: bbox = [x1, y1, x2, y2]
  • pose: keypoints = [[x, y, score], ...]
  • segmentation: segments = [[[x, y], ...], ...] (polygons)
  • oriented boxes (best-effort): obb = [cx, cy, w, h, angle_degrees] plus an axis-aligned bbox

Minimal example

{
  "schema_version": "det-v1",
  "video": {"path": "in.mp4", "fps": 30.0, "frame_count": 120, "width": 1920, "height": 1080},
  "detector": {"name": "ultralytics", "weights": "yolo26n", "conf_thresh": 0.25, "imgsz": 640, "device": "cpu", "half": false, "task": "detect"},
  "frames": [
    {
      "frame": 0,
      "file": "000000.jpg",
      "detections": [
        {"det_ind": 0, "bbox": [100.0, 50.0, 320.0, 240.0], "score": 0.91, "class_id": 0, "class_name": "person"}
      ]
    }
  ]
}

Install

Requires Python 3.11+.

From PyPI

pip install detect-lib

Optional extras (only if you need them):

pip install "detect-lib[export]"      # ONNX / export helpers
pip install "detect-lib[coreml]"      # CoreML export (macOS)
pip install "detect-lib[openvino]"    # OpenVINO export
pip install "detect-lib[tf]"          # TensorFlow export paths (heavy)

From GitHub (uv)

git clone https://github.com/Surya-Rayala/VisionPipeline-detection.git
cd VisionPipeline-detection
uv sync

Extras:

uv sync --extra export
uv sync --extra coreml
uv sync --extra openvino
uv sync --extra tf

CLI

All CLI commands are:

  • python -m ... (pip)
  • uv run python -m ... (uv)

Detection

Help:

python -m detect.cli.detect_video -h

List models (registry + installed):

python -m detect.cli.detect_video --list-models

Common patterns

1) Bounding boxes (typical YOLO / RT-DETR)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolo26n \
  --task detect \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yolo26n_detect

2) Instance segmentation (polygons)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolo26n-seg \
  --task segment \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yolo26n_seg

3) Pose (keypoints)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolo26n-pose \
  --task pose \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yolo26n_pose

4) Open-vocabulary (YOLO-World / YOLOE)

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yolov8s-worldv2 \
  --task openvocab \
  --text "person,car,dog" \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name worldv2_openvocab

*Open-vocabulary + polygons (YOLOE -seg)

Use a YOLOE segmentation weight and segment when you want polygons.

python -m detect.cli.detect_video \
  --video in.mp4 \
  --detector ultralytics \
  --weights yoloe-11s-seg \
  --task segment \
  --text "person,car,dog" \
  --json \
  --save-video annotated.mp4 \
  --out-dir out --run-name yoloe_seg_openvocab

Task semantics (important)

  • detect | segment | pose | obb | classify | sam | sam2 | sam3 | fastsam describe the output type you want.
  • openvocab is a prompt mode for YOLO-World/YOLOE. Output type follows the model (boxes vs masks). If you want polygons, use a *-seg model and segment.

Prompts

You can supply prompts via:

  • --text "a,b,c" (open-vocabulary label list)
  • --box "x1,y1,x2,y2" (repeatable)
  • --point "x,y" or --point "x,y,label" (repeatable; label 1=fg, 0=bg)
  • --prompts prompts.json (combined)

Example prompts.json:

{
  "text": ["person", "car", "dog"],
  "boxes": [[100, 100, 500, 500]],
  "points": [[320, 240, 1], [100, 120, 0]],
  "topk": 5
}

Export note (open-vocab): exported formats (ONNX/CoreML/etc.) may not support changing the vocabulary at runtime. If prompts don’t take effect, run the .pt weights for true open-vocabulary prompting or post-filter detections.

Artifacts (all opt-in)

  • --json writes out/<run-name>/detections.json
  • --frames writes out/<run-name>/frames/*.jpg
  • --save-video NAME.mp4 writes out/<run-name>/NAME.mp4

If you don’t enable any artifacts, no output directory is created.


Python API

Parameter mapping (Python vs CLI)

Python uses snake_case keyword arguments. The CLI uses kebab-case flags. The values are the same, but the names differ.

Common mapping:

  • CLI --video → Python video
  • CLI --detector → Python detector
  • CLI --weights → Python weights
  • CLI --classes "0,2" → Python classes=[0, 2]
  • CLI --conf-thresh → Python conf_thresh
  • CLI --imgsz → Python imgsz
  • CLI --device → Python device
  • CLI --half → Python half=True
  • CLI --task → Python task

Prompts:

  • CLI --text "a,b" → Python prompts={"text": ["a", "b"]}
  • CLI --box "x1,y1,x2,y2" (repeatable) → Python prompts={"boxes": [[x1, y1, x2, y2], ...]}
  • CLI --point "x,y,label" (repeatable) → Python prompts={"points": [[x, y, label], ...]}
  • CLI --topk N → Python topk=N (or prompts={"topk": N})

Artifacts (all opt-in):

  • CLI --json → Python save_json=True
  • CLI --frames → Python save_frames=True
  • CLI --save-video NAME.mp4 → Python save_video="NAME.mp4"
  • CLI --out-dir DIR → Python out_dir="DIR"
  • CLI --run-name NAME → Python run_name="NAME"
  • CLI --no-progress → Python progress=False
  • CLI --display → Python display=True

Note: the Python API also accepts an advanced artifacts=ArtifactOptions(...) object, but the convenience args above are easiest for most usage.

Detect a video

from detect import detect_video

res = detect_video(
    video="in.mp4",
    detector="ultralytics",
    weights="yolo26n",
    task="detect",
    classes=None,          # e.g. [0, 2] to filter class ids
    conf_thresh=0.25,
    imgsz=640,
    device="auto",
    half=False,
    # prompts={"text": ["person", "car", "dog"]},  # for open-vocabulary models
    save_json=True,
    save_video="annotated.mp4",
    out_dir="out",
    run_name="py_detect",
)

print(res.payload["schema_version"], len(res.payload["frames"]))
print(res.paths)

Note: legacy detector aliases (yolo_bbox, yolo_seg, yolo_pose) are still accepted for backward compatibility, but the docs use ultralytics everywhere.


Export

Export is currently implemented for the Ultralytics backend.

CLI export

python -m detect.cli.export_model -h

python -m detect.cli.export_model \
  --weights yolo26n \
  --formats onnx \
  --out-dir models/exports --run-name y26_onnx

Export from Python

Python export also uses snake_case args (e.g., out_dir, run_name) and accepts formats as a list or comma-separated string.

from detect.backends.ultralytics.export import export_model_ultralytics

res = export_model_ultralytics(
    weights="yolo26n",
    formats=["onnx"],
    imgsz=640,
    out_dir="models/exports",
    run_name="y26_onnx_py",
)

print("run_dir:", res["run_dir"])
for p in res["artifacts"]:
    print("-", p)

Compatibility notes:

  • Some model families do not support export (e.g., MobileSAM and SAM/SAM2/SAM3 per Ultralytics docs). The export CLI will warn and exit.
  • YOLO-World v1 weights (*-world.pt) do not support export; use YOLO-World v2 (*-worldv2.pt) for export.
  • YOLOv10 supports export but only to a restricted set of formats; unsupported formats will warn and exit.

License

MIT License. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

detect_lib-0.1.9.tar.gz (31.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

detect_lib-0.1.9-py3-none-any.whl (39.1 kB view details)

Uploaded Python 3

File details

Details for the file detect_lib-0.1.9.tar.gz.

File metadata

  • Download URL: detect_lib-0.1.9.tar.gz
  • Upload date:
  • Size: 31.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for detect_lib-0.1.9.tar.gz
Algorithm Hash digest
SHA256 3faf651781460eabb92f5d4265051b38f0642c4bab9f0c1e6cd6b081ff8a49ec
MD5 8426297cf7b671881713c22004fd60c7
BLAKE2b-256 a0c91fb76d118512cdd15583a1e25d76fa24b9b6f8c198eb784bb9093c905e1b

See more details on using hashes here.

File details

Details for the file detect_lib-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: detect_lib-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 39.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for detect_lib-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 682e68511779dd6975ecaec95c0fe8ddedb7df8d302d2f21bed5d4be329db6fb
MD5 d839080c43fa4414900c4c3ce2efefce
BLAKE2b-256 8ef636113c816f998d0f34c7c4fa56149c84530f3f1158f0badde13921be4e6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page