High-level Python SDK for computer vision inference with ONNX Runtime.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mauriciorocha70

These details have not been verified by PyPI

Project description

ort-vision-sdk

A high-level Python SDK for computer-vision inference on top of ONNX Runtime.

ort-vision-sdk wraps the low-level InferenceSession API behind task-oriented classes — Classifier, Detector, Segmenter — that handle preprocessing, execution-provider selection, and postprocessing for you. You go from a raw image (path, bytes, NumPy array, or PIL image) to a typed result in one call, with an output shape that matches the Ultralytics idiom (boxes.xyxy, cls, conf, names, ...) so existing code ports over with minimal edits.

from ort_vision_sdk import Detector

det = Detector("yolov8n.onnx")           # any anchor-free YOLO export (v8/v9/v10/v11/v12)
result = det.predict("street.jpg")[0]    # list[DetectionResults], length 1 per image
print(result.boxes.xyxy)                 # (N, 4) float64 array, original-image pixels
print(result.boxes.cls, result.boxes.conf)
for d in result:                          # per-instance dataclasses
    print(d.name, d.conf, d.box.xyxy)

Why this SDK

Using onnxruntime directly means you have to:

pick and configure execution providers (CPU / CUDA / TensorRT / ...),
letterbox / resize / normalize / to_chw / batch your image,
decode the model output (anchor grids, NMS, mask prototypes),
map boxes back from the letterboxed input to the original image,
resolve class indices to human-readable labels,
repeat all of the above per task family.

ort-vision-sdk does all of that and gives you a typed, dataclass-based result. The internals are explicit and overridable — you can pass your own mean / std, input_size, conf_threshold, iou_threshold, providers, or a pre-built ort.SessionOptions.

What's in the box

Task	Class	Models supported
Classification	`Classifier`	Any ONNX classifier with output shape `(1, num_classes)` (torchvision-style)
Object detection	`Detector`	Anchor-free YOLO heads: v8, v9, v10, v11, v12, v26 (`(1, 4 + nc, N)`)
Inst. seg.	`Segmenter`	YOLO seg heads: v8-seg, v11-seg, v26-seg (`(1, 4 + nc + nm, N)` + prototypes)

All three return the same envelope shape — a list[Results] of length 1 per image — so you can switch between tasks without rewriting your downstream code.

Installation

pip install ort-vision-sdk             # CPU only (default)
pip install "ort-vision-sdk[gpu]"      # adds onnxruntime-gpu for CUDA / TensorRT
pip install "ort-vision-sdk[opencv]"   # adds OpenCV image backend
pip install "ort-vision-sdk[dev]"      # ruff, mypy, pytest, build, twine

Requires Python 3.10+.

Quick start

Classification

from ort_vision_sdk import Classifier

clf = Classifier(
    "resnet50.onnx",
    labels="imagenet_labels.txt",   # one class per line, or pass a list/dict
    input_size=(224, 224),          # default
    apply_softmax=True,             # set False if your model already outputs probs
)

results = clf.predict("dog.jpg")
r = results[0]

print(r.cls, r.conf, r.name)        # top-1 — Ultralytics-style
print(r.probs.top5)                 # array of top-5 class indices
print(r.probs.top5conf)             # corresponding probabilities
print(r.probabilities[:5])          # tuple of ClassProbability dataclasses

Object detection

from ort_vision_sdk import Detector

det = Detector(
    "yolov8n.onnx",
    labels="coco",                   # default — 80-class COCO preset
    input_size=(640, 640),
    conf_threshold=0.25,
    iou_threshold=0.45,
)

result = det.predict("street.jpg")[0]

# Bulk numpy view — Ultralytics' Boxes interface
print(result.boxes.xyxy.shape)       # (N, 4) absolute pixels
print(result.boxes.xywhn)            # (N, 4) normalized (cx, cy, w, h)
print(result.boxes.cls)              # (N,) int64
print(result.boxes.conf)             # (N,) float64
print(result.boxes.data)             # (N, 6) [x1, y1, x2, y2, conf, cls]

# Per-instance dataclasses
for d in result:
    print(d.name, d.conf, d.box.xyxy)
    # d.cropped_image is an HWC uint8 RGB ndarray of the box crop

Instance segmentation

from ort_vision_sdk import Segmenter

seg = Segmenter(
    "yolov8n-seg.onnx",
    labels="coco",
    mask_threshold=0.5,              # cutoff for soft → binary mask
)

result = seg.predict("street.jpg")[0]

# Same Boxes view as the detector …
print(result.boxes.xyxy, result.boxes.cls, result.boxes.conf)

# … plus per-instance binary masks
for inst in result:
    print(inst.name, inst.conf, inst.box.xyxy)
    print(inst.mask.shape)           # (h, w) uint8 ∈ {0, 255}, cropped to bbox
    print(inst.segmented_image.shape) # (h, w, 3) RGB with background zeroed out

Inputs

Every predict() accepts the same set of image inputs:

from pathlib import Path
from PIL import Image
import numpy as np

clf.predict("dog.jpg")               # str path
clf.predict(Path("dog.jpg"))         # pathlib
clf.predict(open("dog.jpg", "rb").read())   # raw bytes (PNG, JPEG, ...)
clf.predict(Image.open("dog.jpg"))   # PIL — any mode, converted to RGB
clf.predict(np.zeros((480, 640, 3), dtype=np.uint8))  # HWC uint8 RGB ndarray

Need to load an image once and reuse it? Use the same loader the SDK uses internally:

from ort_vision_sdk import load_image
img = load_image("dog.jpg")          # HWC uint8 RGB
clf.predict(img)

Labels

Tasks resolve labels at construction time via resolve_labels:

from ort_vision_sdk import Classifier, COCO_CLASSES, resolve_labels

# 1) Built-in preset (currently: "coco")
det = Detector("yolov8n.onnx", labels="coco")

# 2) Explicit list / tuple
clf = Classifier("model.onnx", labels=["cat", "dog", "fox"])

# 3) Sparse dict — gaps filled with "class_<id>"
clf = Classifier("model.onnx", labels={0: "cat", 2: "fox"})

# 4) File path — one class per line
clf = Classifier("model.onnx", labels="imagenet_labels.txt")

# 5) None — auto-generates "class_0", "class_1", ... (only works when the
#    model's output shape is statically known)
clf = Classifier("model.onnx", labels=None)

names on every result is the canonical dict[int, str] mapping (mirrors Ultralytics' model.names).

Execution providers

By default the SDK picks the first available provider in ORT's preference order. To pin a specific backend, pass providers= with either short aliases or canonical ORT names:

det = Detector("yolov8n.onnx", providers=["cuda", "cpu"])
det = Detector("yolov8n.onnx", providers=["tensorrt", "cuda", "cpu"])
det = Detector("yolov8n.onnx", providers=["CUDAExecutionProvider"])  # canonical name

Aliases supported: "cpu", "cuda", "tensorrt", "directml", "coreml", "openvino", "rocm". Anything else is forwarded verbatim to ORT.

For fine-grained control (graph optimization, threading, profiling) pass an ort.SessionOptions instance:

import onnxruntime as ort

opts = ort.SessionOptions()
opts.graph_optimization_level = ort.GraphOptimizationLevel.ORT_ENABLE_ALL
opts.intra_op_num_threads = 4

det = Detector("yolov8n.onnx", session_options=opts)

Result objects

Each predict() call returns list[Results] of length 1 (per image), so the typical pattern is results[0]. The envelope is iterable and indexable — iterating yields per-instance dataclasses, so legacy "list of detections" code works with one extra [0].

Envelope	Bulk view	Iterating yields	Notable fields
`ClassificationResults`	`probs`	n/a (single result)	`cls`, `conf`, `name`, `probabilities`
`DetectionResults`	`boxes`	`DetectionResult`	`cls`, `conf`, `box.xyxy`, `cropped_image`
`SegmentationResults`	`boxes`, `masks`	`SegmentationResult`	`cls`, `conf`, `box.xyxy`, `mask`, `segmented_image`

Every envelope also exposes names, orig_img, orig_shape, path, and an optional speed timings dict.

The bulk views (Boxes, Probs, Masks) match Ultralytics one-to-one: boxes.xyxy, boxes.xywh, boxes.xyxyn, boxes.xywhn, boxes.cls, boxes.conf, boxes.data; probs.top1, probs.top5, probs.top1conf, probs.top5conf, probs.data; masks.data, masks.xyxy.

Per-instance dataclasses (DetectionResult, SegmentationResult, ClassProbability, ClassificationResult) carry the verbose names (class_id, class_name, confidence, bbox) as canonical fields and expose Ultralytics aliases (cls, name, conf, box) as read-only properties — pick whichever style your codebase already uses.

Common patterns

Iterate detections only

for d in det.predict("img.jpg")[0]:
    print(d.name, d.conf, d.box.xyxy)

Filter by class

result = det.predict("img.jpg")[0]
people = [d for d in result if d.name == "person"]

Save crops

from PIL import Image
for i, d in enumerate(det.predict("img.jpg")[0]):
    Image.fromarray(d.cropped_image).save(f"crop_{i}.png")

Batch over a folder

from pathlib import Path
for path in Path("images").glob("*.jpg"):
    result = det.predict(path)[0]
    print(path.name, len(result), "detections")

Override per-call thresholds (detector)

result = det.predict("img.jpg", conf_threshold=0.4, iou_threshold=0.5)[0]

Status

This project is alpha — the public API is stable enough to build against, but minor versions may introduce breaking changes during the pre-1.0 phase. Pin the version range you build against.

Source code & issues: https://github.com/mauriciobenjamin700/ort-vision-sdk
Changelog: https://github.com/mauriciobenjamin700/ort-vision-sdk/blob/main/sdk-python/CHANGELOG.md
Browser counterpart (TypeScript): @mauriciobenjamin700/ort-vision-sdk-web

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

mauriciorocha70

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.0

May 3, 2026

This version

0.2.1

May 3, 2026

0.2.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ort_vision_sdk-0.2.1.tar.gz (40.5 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ort_vision_sdk-0.2.1-py3-none-any.whl (43.8 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file ort_vision_sdk-0.2.1.tar.gz.

File metadata

Download URL: ort_vision_sdk-0.2.1.tar.gz
Upload date: May 3, 2026
Size: 40.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ort_vision_sdk-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`ce07acfed967f60b678472792203e713e775780f81a13cbe738caa8866e3742a`
MD5	`20fd1e1b3671023f5d844e428c7657d5`
BLAKE2b-256	`9ab96f0488946e35c858245564c707bc2121f9938f6bb365e2cdee3a431e2862`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ort_vision_sdk-0.2.1.tar.gz:

Publisher: release-pypi.yml on mauriciobenjamin700/ort-vision-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ort_vision_sdk-0.2.1.tar.gz
- Subject digest: ce07acfed967f60b678472792203e713e775780f81a13cbe738caa8866e3742a
- Sigstore transparency entry: 1435949400
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: mauriciobenjamin700/ort-vision-sdk@0f8d392bbc7feaeee3ce35d67140ff2cf7f85d94
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/mauriciobenjamin700
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-pypi.yml@0f8d392bbc7feaeee3ce35d67140ff2cf7f85d94
- Trigger Event: push

File details

Details for the file ort_vision_sdk-0.2.1-py3-none-any.whl.

File metadata

Download URL: ort_vision_sdk-0.2.1-py3-none-any.whl
Upload date: May 3, 2026
Size: 43.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ort_vision_sdk-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8f6eed24caf5c13afcff0b331c66dadc02addbd7a0414e6c0c0110a3e803a2e9`
MD5	`04e4c24c48a6415d7e7e1b31e7105c67`
BLAKE2b-256	`3b3a7a6524507818cb9f5606686c56f01a298cf50b07fb331f71c80f5f9eb5f9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ort_vision_sdk-0.2.1-py3-none-any.whl:

Publisher: release-pypi.yml on mauriciobenjamin700/ort-vision-sdk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ort_vision_sdk-0.2.1-py3-none-any.whl
- Subject digest: 8f6eed24caf5c13afcff0b331c66dadc02addbd7a0414e6c0c0110a3e803a2e9
- Sigstore transparency entry: 1435949410
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: mauriciobenjamin700/ort-vision-sdk@0f8d392bbc7feaeee3ce35d67140ff2cf7f85d94
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/mauriciobenjamin700
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-pypi.yml@0f8d392bbc7feaeee3ce35d67140ff2cf7f85d94
- Trigger Event: push

ort-vision-sdk 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

ort-vision-sdk

Why this SDK

What's in the box

Installation

Quick start

Classification

Object detection

Instance segmentation

Inputs

Labels

Execution providers

Result objects

Common patterns

Iterate detections only

Filter by class

Save crops

Batch over a folder

Override per-call thresholds (detector)

Status

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance