A modular video object detection toolkit with a clean det-v1 JSON schema, pluggable backends, and optional model export.
Project description
detect
A modular video detection toolkit that produces a stable det-v1 JSON output schema, with a pluggable backend (currently Ultralytics) and optional model export.
- Backend: Ultralytics (YOLO families, RT-DETR, YOLO-World/YOLOE, SAM/FastSAM — depending on your installed
ultralyticsversion) - Default behavior: no files are written unless you opt-in (JSON / frames / annotated video)
Output schema (det-v1)
Every run returns a det-v1 payload in memory (and the CLI prints it to stdout).
Top-level keys:
schema_version: always"det-v1"video:{path, fps, frame_count, width, height}detector: configuration used for the run (name/weights/conf/imgsz/device/half + task + optional prompts/topk)frames: list of per-frame records
Per-frame record:
frame: 0-based frame indexfile: standard frame filename (e.g.000000.jpg) (even if frames aren’t saved)detections: list of detections
Detection fields:
- boxes:
bbox = [x1, y1, x2, y2] - pose:
keypoints = [[x, y, score], ...] - segmentation:
segments = [[[x, y], ...], ...](polygons) - oriented boxes (best-effort):
obb = [cx, cy, w, h, angle_degrees]plus an axis-alignedbbox
Minimal example
{
"schema_version": "det-v1",
"video": {"path": "in.mp4", "fps": 30.0, "frame_count": 120, "width": 1920, "height": 1080},
"detector": {"name": "ultralytics", "weights": "yolo26n", "conf_thresh": 0.25, "imgsz": 640, "device": "cpu", "half": false, "task": "detect"},
"frames": [
{
"frame": 0,
"file": "000000.jpg",
"detections": [
{"det_ind": 0, "bbox": [100.0, 50.0, 320.0, 240.0], "score": 0.91, "class_id": 0, "class_name": "person"}
]
}
]
}
Install
Requires Python 3.11+.
From PyPI
pip install detect-lib
Optional extras (only if you need them):
pip install "detect-lib[export]" # ONNX / export helpers
pip install "detect-lib[coreml]" # CoreML export (macOS)
pip install "detect-lib[openvino]" # OpenVINO export
pip install "detect-lib[tf]" # TensorFlow export paths (heavy)
From GitHub (uv)
git clone https://github.com/Surya-Rayala/VisionPipeline-detection.git
cd VisionPipeline-detection
uv sync
Extras:
uv sync --extra export
uv sync --extra coreml
uv sync --extra openvino
uv sync --extra tf
CLI
All CLI commands are:
python -m ...(pip)uv run python -m ...(uv)
Detection
Help:
python -m detect.cli.detect_video -h
List models (registry + installed):
python -m detect.cli.detect_video --list-models
Common patterns
1) Bounding boxes (typical YOLO / RT-DETR)
python -m detect.cli.detect_video \
--video in.mp4 \
--detector ultralytics \
--weights yolo26n \
--task detect \
--json \
--save-video annotated.mp4 \
--out-dir out --run-name yolo26n_detect
2) Instance segmentation (polygons)
python -m detect.cli.detect_video \
--video in.mp4 \
--detector ultralytics \
--weights yolo26n-seg \
--task segment \
--json \
--save-video annotated.mp4 \
--out-dir out --run-name yolo26n_seg
3) Pose (keypoints)
python -m detect.cli.detect_video \
--video in.mp4 \
--detector ultralytics \
--weights yolo26n-pose \
--task pose \
--json \
--save-video annotated.mp4 \
--out-dir out --run-name yolo26n_pose
4) Open-vocabulary (YOLO-World / YOLOE)
python -m detect.cli.detect_video \
--video in.mp4 \
--detector ultralytics \
--weights yolov8s-worldv2 \
--task openvocab \
--text "person,car,dog" \
--json \
--save-video annotated.mp4 \
--out-dir out --run-name worldv2_openvocab
*Open-vocabulary + polygons (YOLOE -seg)
Use a YOLOE segmentation weight and segment when you want polygons.
python -m detect.cli.detect_video \
--video in.mp4 \
--detector ultralytics \
--weights yoloe-11s-seg \
--task segment \
--text "person,car,dog" \
--json \
--save-video annotated.mp4 \
--out-dir out --run-name yoloe_seg_openvocab
Task semantics (important)
detect | segment | pose | obb | classify | sam | sam2 | sam3 | fastsamdescribe the output type you want.openvocabis a prompt mode for YOLO-World/YOLOE. Output type follows the model (boxes vs masks). If you want polygons, use a*-segmodel andsegment.
Prompts
You can supply prompts via:
--text "a,b,c"(open-vocabulary label list)--box "x1,y1,x2,y2"(repeatable)--point "x,y"or--point "x,y,label"(repeatable; label 1=fg, 0=bg)--prompts prompts.json(combined)
Example prompts.json:
{
"text": ["person", "car", "dog"],
"boxes": [[100, 100, 500, 500]],
"points": [[320, 240, 1], [100, 120, 0]],
"topk": 5
}
Export note (open-vocab): exported formats (ONNX/CoreML/etc.) may not support changing the vocabulary at runtime. If prompts don’t take effect, run the .pt weights for true open-vocabulary prompting or post-filter detections.
Artifacts (all opt-in)
--jsonwritesout/<run-name>/detections.json--frameswritesout/<run-name>/frames/*.jpg--save-video NAME.mp4writesout/<run-name>/NAME.mp4
If you don’t enable any artifacts, no output directory is created.
Python API
Parameter mapping (Python vs CLI)
Python uses snake_case keyword arguments. The CLI uses kebab-case flags. The values are the same, but the names differ.
Common mapping:
- CLI
--video→ Pythonvideo - CLI
--detector→ Pythondetector - CLI
--weights→ Pythonweights - CLI
--classes "0,2"→ Pythonclasses=[0, 2] - CLI
--conf-thresh→ Pythonconf_thresh - CLI
--imgsz→ Pythonimgsz - CLI
--device→ Pythondevice - CLI
--half→ Pythonhalf=True - CLI
--task→ Pythontask
Prompts:
- CLI
--text "a,b"→ Pythonprompts={"text": ["a", "b"]} - CLI
--box "x1,y1,x2,y2"(repeatable) → Pythonprompts={"boxes": [[x1, y1, x2, y2], ...]} - CLI
--point "x,y,label"(repeatable) → Pythonprompts={"points": [[x, y, label], ...]} - CLI
--topk N→ Pythontopk=N(orprompts={"topk": N})
Artifacts (all opt-in):
- CLI
--json→ Pythonsave_json=True - CLI
--frames→ Pythonsave_frames=True - CLI
--save-video NAME.mp4→ Pythonsave_video="NAME.mp4" - CLI
--out-dir DIR→ Pythonout_dir="DIR" - CLI
--run-name NAME→ Pythonrun_name="NAME" - CLI
--no-progress→ Pythonprogress=False - CLI
--display→ Pythondisplay=True
Note: the Python API also accepts an advanced artifacts=ArtifactOptions(...) object, but the convenience args above are easiest for most usage.
Detect a video
from detect import detect_video
res = detect_video(
video="in.mp4",
detector="ultralytics",
weights="yolo26n",
task="detect",
classes=None, # e.g. [0, 2] to filter class ids
conf_thresh=0.25,
imgsz=640,
device="auto",
half=False,
# prompts={"text": ["person", "car", "dog"]}, # for open-vocabulary models
save_json=True,
save_video="annotated.mp4",
out_dir="out",
run_name="py_detect",
)
print(res.payload["schema_version"], len(res.payload["frames"]))
print(res.paths)
Note: legacy detector aliases (yolo_bbox, yolo_seg, yolo_pose) are still accepted for backward compatibility, but the docs use ultralytics everywhere.
Export
Export is currently implemented for the Ultralytics backend.
CLI export
python -m detect.cli.export_model -h
python -m detect.cli.export_model \
--weights yolo26n \
--formats onnx \
--out-dir models/exports --run-name y26_onnx
Export from Python
Python export also uses snake_case args (e.g., out_dir, run_name) and accepts formats as a list or comma-separated string.
from detect.backends.ultralytics.export import export_model_ultralytics
res = export_model_ultralytics(
weights="yolo26n",
formats=["onnx"],
imgsz=640,
out_dir="models/exports",
run_name="y26_onnx_py",
)
print("run_dir:", res["run_dir"])
for p in res["artifacts"]:
print("-", p)
Compatibility notes:
- Some model families do not support export (e.g., MobileSAM and SAM/SAM2/SAM3 per Ultralytics docs). The export CLI will warn and exit.
- YOLO-World v1 weights (
*-world.pt) do not support export; use YOLO-World v2 (*-worldv2.pt) for export. - YOLOv10 supports export but only to a restricted set of formats; unsupported formats will warn and exit.
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file detect_lib-0.1.8.tar.gz.
File metadata
- Download URL: detect_lib-0.1.8.tar.gz
- Upload date:
- Size: 31.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c58e14c035c71188497280ae1ee1235e1fba4ca49395a0451c34501f9bd3a7b0
|
|
| MD5 |
0a5798a3a184e67d82fdc69b8abeaba7
|
|
| BLAKE2b-256 |
4ec9fd5c8c41601e56cfecf93135cde0f4510fb36f53adf43e108dd198a7580a
|
File details
Details for the file detect_lib-0.1.8-py3-none-any.whl.
File metadata
- Download URL: detect_lib-0.1.8-py3-none-any.whl
- Upload date:
- Size: 39.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfe37583100128f935598261fccf2a0e1d49027a4d4a2c00516b4aa7fe8e92fc
|
|
| MD5 |
acfec8d9b56df3eedaa55fb921f8d003
|
|
| BLAKE2b-256 |
147ca199f2d862ba6f5d671e1e0160baa0a68fee74f857ebee7232c6353b7707
|