A permissive-license-aware framework for serving modern computer vision models locally and over Cloudflare Tunnel.
Project description
VisionServeX
Accuracy-aware computer vision model gateway — honest, local-first, and privacy-respecting.
Serve modern CV models on your machine. Local-only by default. No data retained.
What is VisionServeX?
VisionServeX is an open-source, permissive-license-aware Python framework for running modern computer vision models locally and exposing them through a stable HTTP API. It works as a local model gateway: start it once, call any supported model through one clean API.
Accuracy-aware and scientifically usable:
Every model carries an explicit accuracy taxonomy label: demo_fast, production_recommended, accuracy_grade, experimental_sota, expert_sidecar, external_api, or unavailable_with_reason. The recommender, benchmark tools, and registry are aligned to these labels so you always know what tier you are running. Real AP50/mAP50:95 is computed when you provide an annotated dataset.
Honesty policy:
VisionServeX does not claim to beat Ultralytics globally. The benchmark-competitiveness tool is designed to reveal the honest truth. If YOLO wins, it will say so.
Privacy-first design:
- Binds to
127.0.0.1by default — nothing leaves your machine. - Images are decoded in memory for inference and never written to disk by default.
- No data is retained between requests by default.
- Log redaction removes tokens, base64, and API keys from all output.
⚠️ No end-to-end encryption claimed. VisionServeX cannot provide E2E encryption in the cryptographic sense — the inference server must see plaintext image tensors to run models. We provide local-first processing, no-retention defaults, optional encryption-at-rest for job metadata, and auth for public mode. See docs/privacy.md.
Quickstart (CPU, 5 minutes)
pip install 'visionservex[server,hf,rfdetr]'
visionservex getting-started # personalized guide
visionservex pull dfine-s-o365-coco # accuracy-grade detection, CPU-capable
visionservex serve # http://127.0.0.1:8080
curl -F "image=@image.jpg" -F "model_id=dfine-s-o365-coco" \
http://127.0.0.1:8080/detect | jq
For a quick demo (smallest model):
visionservex pull rfdetr-nano # demo_fast, CPU-capable
visionservex predict rfdetr-nano image.jpg
Ultralytics-Like Workflow
Same mental model — different backends, all permissive-license.
from visionservex import VisionModel
model = VisionModel("dfine-x-o365-coco")
model.pull() # download checkpoint
model.info() # show registry metadata
results = model.predict("image.jpg", conf=0.25)
results.save("outputs/") # save annotated image
results.plot() # returns PIL Image
results.to_json() # JSON string
results.to_csv() # CSV string
results.debug() # detailed debug string
# Check what operations are supported
model.supports("val") # {"supported": True, ...}
model.supports("train") # {"supported": False, "reason": "..."}
model.training_info() # per-family training capabilities
model.export_info() # per-family export capabilities
model.val(dataset="yolo:/data/coco128", max_images=100) # AP50/mAP50:95
Note: Not all operations exist for all models. Use
model.supports("operation")andvisionservex model-card show MODELto check capabilities. Unlike Ultralytics, VisionServeX does not depend on Ultralytics as a package.
# CLI task aliases
visionservex detect dfine-x-o365-coco image.jpg --conf 0.25 --device cuda
visionservex segment rfdetr-seg-medium image.jpg --save-image out.jpg
visionservex classify swinv2-base image.jpg --top-k 5
visionservex open-vocab grounding-dino-swin-b image.jpg --prompt "car,person"
visionservex val dfine-x-o365-coco --dataset yolo:/path/to/coco128 --max-images 128
# Model lifecycle
visionservex model pull dfine-x-o365-coco --dry-run
visionservex model info dfine-x-o365-coco
visionservex model checkpoint-info dfine-x-o365-coco
visionservex training capabilities --model rfdetr-large
visionservex export-cmd capabilities --model dfine-x-o365-coco
Output Normalization
The built-in normalizer handles all common detection serialization formats:
from visionservex import normalize_detections, parse_api_response
# Accepts all these formats:
dets = normalize_detections([
{"xyxy": [10, 20, 100, 200], "score": 0.9, "label": "cat"},
{"box": {"x1": 10, "y1": 20, "x2": 100, "y2": 200}, "confidence": 0.8, "category": "dog"},
{"bbox": [10, 20, 90, 180], "bbox_format": "xywh", "conf": 0.7, "class_id": 0},
])
# Parse VisionServeX HTTP API responses directly:
import requests
resp = requests.get("http://127.0.0.1:8080/detect", ...)
dets = parse_api_response(resp.json())
Never silently drops all predictions — emits AllPredictionsDroppedWarning if normalization fails.
Python Client
from visionservex import Client, VisionModel
# Direct inference (local, no server needed)
result = VisionModel("dfine-s-o365-coco").predict("image.jpg") # accuracy_grade
result = VisionModel("rfdetr-nano").predict("image.jpg") # demo_fast
# Via local gateway
client = Client("http://127.0.0.1:8080")
result = client.detect("dfine-s-o365-coco", "image.jpg")
result = client.grounded_segment("grounded-sam2", "image.jpg", prompt="car, person")
result = client.classify("swinv2-tiny", "image.jpg")
Model Taxonomy
Every model in the registry now carries an explicit model_category label.
| Category | Meaning | Examples |
|---|---|---|
demo_fast |
Quick demo, small, not for accuracy benchmarks | dfine-n, rfdetr-nano, rfdetr-seg-nano, grounding-dino-tiny |
production_recommended |
Solid accuracy, ready for real use | rfdetr-small, rfdetr-seg-small, swinv2-tiny, sam-vit-base |
accuracy_grade |
Tracked for AP benchmarks; explicitly wired | dfine-s-o365-coco, dfine-m/l/x-o365-coco, rfdetr-medium/large, grounding-dino-swin-b |
experimental_sota |
Claims SOTA but not fully verified in this build | deim-s/m, deimv2-s/m, rtdetrv4-s/m/l/x, maskdino-r50-coco |
expert_sidecar |
Requires expert setup (OpenMMLab, custom ops) | rtmpose-*, internimage-*, co-dino-* |
external_api |
API-gated upstream; not self-hostable | grounding-dino-1.5/1.6 |
unavailable_with_reason |
Blocked; honest reason documented | rfdetr-seg-large/xlarge/2xlarge |
utility |
Mock / built-in / test helpers | mock-detect, mock-classify, … |
Key rule: demo_fast models are not used to claim competitiveness with YOLO. Use accuracy_grade variants for AP benchmarks.
What works today
Detection (wired, runnable)
| Model ID | Category | Checkpoint | Install |
|---|---|---|---|
dfine-n / dfine-n-coco |
demo_fast | ustc-community/dfine-nano-coco | [hf] |
dfine-s-o365-coco ★ |
accuracy_grade | ustc-community/dfine-small-obj2coco | [hf] |
dfine-m-o365-coco |
accuracy_grade | ustc-community/dfine-medium-obj2coco | [hf] |
dfine-l-o365-coco |
accuracy_grade | ustc-community/dfine-large-obj2coco-e25 | [hf] |
dfine-x-o365-coco |
accuracy_grade | ustc-community/dfine-xlarge-obj2coco | [hf] |
rfdetr-nano |
demo_fast | rfdetr pkg | [rfdetr] |
rfdetr-small ★ |
production_recommended | rfdetr pkg | [rfdetr] |
rfdetr-medium |
accuracy_grade | rfdetr pkg | [rfdetr] |
rfdetr-large |
accuracy_grade | rfdetr pkg | [rfdetr] |
★ Recommended accuracy entry points: dfine-s-o365-coco (CPU-capable) and rfdetr-small (GPU-preferred).
Segmentation
| Family | Models | Category | Install |
|---|---|---|---|
| RF-DETR-Seg | rfdetr-seg-nano/small/medium |
demo_fast / production_recommended / accuracy_grade | [rfdetr] |
| SAM v1 | sam-vit-base/large/huge |
production_recommended / accuracy_grade | [hf] |
| SAM 2 | sam2-hiera-tiny/small/base-plus/large |
production_recommended / accuracy_grade | [hf] |
| Grounded SAM | grounded-sam, grounded-sam2 |
production_recommended | [hf] |
| OneFormer | oneformer-swin-large/dinat-large/convnext-large |
accuracy_grade | [hf] |
Classification
| Family | Models | Category | Install |
|---|---|---|---|
| SwinV2 | swinv2-tiny/small |
production_recommended | [hf] |
| SwinV2 | swinv2-base/large |
accuracy_grade | [hf] |
| InternImage | internimage-t/s/b/l/h |
expert_sidecar | OpenMMLab |
Open-Vocabulary Detection
| Model | Category | Install |
|---|---|---|
grounding-dino-tiny |
demo_fast | [hf] |
grounding-dino-swin-b |
accuracy_grade | [hf] |
grounding-dino-1.5/1.6 |
external_api | API token required |
Experimental SOTA (stub — not runnable yet)
| Family | Models | Blocker |
|---|---|---|
| DEIM | deim-s/m, deimv2-s/m |
No HF/pip path; custom loader + license verification needed |
| RT-DETRv4 | rtdetrv4-s/m/l/x |
No official release numbering; checkpoint source unclear |
| MaskDINO | maskdino-r50-coco/panoptic |
detectron2 environment required |
Competitiveness Benchmark
# Synthetic mode (latency + detection health, no ground truth needed)
visionservex benchmark benchmark-competitiveness \
--models dfine-s-o365-coco,rfdetr-small \
--max-images 20 --device auto
# Real AP mode (AP50/mAP50:95 with YOLO-format annotated dataset)
visionservex benchmark benchmark-competitiveness \
--models dfine-s-o365-coco,rfdetr-small,ultralytics:yolo11n \
--dataset yolo:/path/to/coco128 \
--max-images 100 \
--out reports/ap_benchmark
# COCO JSON format
visionservex benchmark benchmark-competitiveness \
--models dfine-s-o365-coco,rfdetr-small \
--dataset coco-json:/data/coco/images:/data/coco/annotations/instances_val2017.json \
--max-images 500
Real AP/mAP is computed with COCO-style 101-point interpolated PR curves when --dataset is provided. Results are exported as JSON + CSV. The tool is honest — if YOLO wins, it will say so.
Note: Accuracy-grade models are separate from demo models. Do not judge VisionServeX by dfine-n or rfdetr-nano — use dfine-s-o365-coco or rfdetr-small for AP comparison.
Detection, segmentation, classification, pose, OBB, and open-vocabulary tasks need different metrics. Non-detection benchmark tools return BENCHMARK_NOT_IMPLEMENTED with honest roadmap information (roadmap: v1.4).
Capabilities Report
# What can VisionServeX do on this machine right now?
visionservex capabilities report
visionservex capabilities report --format markdown --out docs/capabilities.md
visionservex capabilities report --json
Covers: devices, installed extras, model counts by task/category, runnable models, unavailable blockers, goal-based recommendations, security status, and known limitations.
Model Cards
# Structured per-model documentation
visionservex model-card show dfine-s-o365-coco
visionservex model-card show dfine-s-o365-coco --format markdown
visionservex model-card list --task detect
visionservex model-card export --out docs/model_cards.md
Every card includes: recommended_for, not_recommended_for, competes_with, hardware requirements, official benchmark note, and VisionServeX benchmark status.
Replacement Map
# Which VisionServeX models replace each Ultralytics/YOLO task?
visionservex replacement-map map --task detect
visionservex replacement-map map --task segment
visionservex replacement-map map --task classify
visionservex replacement-map map --task pose
visionservex replacement-map map --format markdown
Honest and task-specific. Does not claim "better" unless AP evidence exists.
Debug Output
Before declaring a checkpoint weak, run the postprocessing audit:
visionservex debug-output dfine-s-o365-coco image.jpg
visionservex debug-output dfine-s-o365-coco image.jpg --threshold 0.01 --json
Reports: score histogram, label histogram, first 10 boxes, invalid boxes, unmapped labels, preprocessing notes.
Model Recommender
# By goal (v1.2.0)
visionservex recommend --task detect --goal accuracy
visionservex recommend --task detect --goal fastest_demo
visionservex recommend --goal best_segmentation
visionservex recommend --goal best_open_vocab
# By task and hardware
visionservex recommend --task detect --device cpu
visionservex recommend --task detect --device cuda --vram 8
For --goal accuracy --task detect, the recommender surfaces dfine-s/m-o365-coco and rfdetr-small/medium, not nano variants.
Security and Privacy
visionservex security audit --json
visionservex security mode cloudflare_private --apply
visionservex gateway token
visionservex security test-redaction
visionservex privacy inspect-cache
visionservex privacy cleanup --dry-run
Security modes:
| Mode | Binding | Auth | Notes |
|---|---|---|---|
local_private |
127.0.0.1 | Optional | Default, safest |
lan_private |
LAN | Required | TLS recommended |
cloudflare_private |
127.0.0.1 + tunnel | Required | Cloudflare Access recommended |
production_multi_user |
127.0.0.1 + proxy | Required | Encrypted job store, audit logs |
Safe Cloudflare Tunnel
export VISIONSERVEX_AUTH__ENABLED=true
export VISIONSERVEX_AUTH__API_KEY=$(visionservex gateway token 2>&1 | grep "API key:" | awk '{print $NF}')
visionservex tunnel config --domain api.yourdomain.com --out tunnel.yaml
visionservex serve &
visionservex tunnel run tunnel.yaml --i-understand-this-is-public
VRAM Lifecycle Safety
VisionServeX manages GPU memory to prevent stepwise VRAM accumulation during repeated model loads.
# Context manager — GPU cleanup on exit
with VisionModel("dfine-x-o365-coco", device="cuda") as model:
result = model.predict("image.jpg")
# GPU memory flushed automatically after context exit
# Explicit cleanup
model = VisionModel("rfdetr-large", device="cuda")
result = model.predict("image.jpg")
model.unload() # full cleanup: engine.unload + GC + CUDA empty_cache + ipc_collect
# One-shot predict with immediate unload
result = model.predict("image.jpg", unload_after=True)
# VRAM diagnostics
visionservex gpu explain-memory # allocated vs reserved breakdown
visionservex gpu cleanup-cache # flush CUDA allocator cache
visionservex gpu memory-test dfine-s-o365-coco --runs 5 # check VRAM growth
visionservex gpu memory-test-suite --models dfine-s-o365-coco,rfdetr-small
# Process-isolated benchmark (full CUDA context released after each model)
visionservex benchmark benchmark-competitiveness \
--models dfine-x-o365-coco,rfdetr-large \
--dataset yolo:/path/to/coco128 \
--isolate-process \
--out reports/ap_benchmark
Segmentation Evaluation
# Latency-only (no ground truth needed)
visionservex benchmark benchmark-segmentation \
--models rfdetr-seg-medium --max-images 20
# Real mask AP with COCO JSON annotations
visionservex benchmark benchmark-segmentation \
--models rfdetr-seg-medium \
--dataset coco-json:/data/coco/images:/data/coco/annotations/instances_val2017.json \
--max-images 200 --out reports/seg_ap
Note: Mask AP uses binary mask IoU — NOT the same as detection box AP50. Do not mix these metrics.
GPU Safety
visionservex gpu guard-status
visionservex gpu processes
visionservex gpu cleanup --dry-run
visionservex gpu cleanup --yes
See docs/gpu_safety.md and docs/parallel_safety.md.
Temporary Colab GPU Worker (optional)
Run VisionServeX on a Google Colab GPU as a short-lived remote worker. Good for demos and benchmarks, not for production — Colab sessions can disconnect at any time.
# Inside a Colab notebook:
!pip install -U 'visionservex[server,hf,rfdetr]'
!visionservex colab doctor
!visionservex gateway start --profile colab-gpu-worker
A copy-paste notebook lives at examples/colab/VisionServeX_Colab_GPU_Worker.ipynb. Full guide: docs/colab_gpu_worker.md.
Installation
pip install visionservex # base (no heavy deps)
pip install 'visionservex[server]' # + HTTP API server
pip install 'visionservex[hf]' # + HF Transformers (D-FINE, GD, SwinV2, SAM, SAM2, OneFormer)
pip install 'visionservex[rfdetr]' # + RF-DETR and RF-DETR-Seg
pip install 'visionservex[server,hf,rfdetr]' # full recommended
OpenMMLab (RTMPose, RTMDet-R, Co-DINO, InternImage): Docker sidecar or pip install openmim && mim install mmengine mmcv mmpose. See docs/openmmlab_expert_models.md.
Known Limitations
- D-FINE COCO-only variants (
dfine-s-cocoetc.): Point to HF repos that may not exist yet. Usedfine-s-o365-coco(Objects365+COCO) for guaranteed availability. - DEIM / RT-DETRv4: Registered as
experimental_sotabut not wired. Blockers documented per-model in the registry. - AP50/mAP benchmark: The
benchmark-competitivenesstool reports latency and detection health only. Full AP evaluation requires ground-truth COCO annotations not bundled with VisionServeX. - OpenMMLab (RTMPose, RTMDet-R/R2, Co-DINO, InternImage): Requires the OpenMMLab toolchain and manually-obtained checkpoints. Returns
CHECKPOINT_REQUIREDstructured error — no fake output. - TensorRT: ONNX export works for SwinV2. TensorRT engine build requires
trtexec. - Apple MPS: Implemented but not maintainer-verified.
GPU: CUDA verified on RTX 5080 for 6+ model families. Run visionservex gpu smoke-test on your hardware.
MPS (Apple Silicon): Implemented, not maintainer-verified. See docs/gpu_validation.md.
VRAM safety: Desktop GPU guard reserves 3 GB for GUI/system. GPU tests run serially by default. See docs/gpu_safety.md.
Syntax Contract
All documented CLI/Python/API examples are covered and verified. No example is allowed to silently fail or return a raw traceback.
visionservex syntax audit # verify examples, failing must be 0
visionservex validation run release # run full CI test suite
Documentation
| Beginner quickstart | 5-minute guide |
| Local gateway | Gateway commands and Python client |
| Security | Threat model, modes, configuration |
| Privacy | No E2E claim, retention policy, encryption |
| Model zoo | All 87 models with current status and taxonomy |
| Model cards | Structured per-model cards with honest benchmark notes |
| Replacement map | Ultralytics/YOLO → VisionServeX replacement guide |
| Benchmark competitiveness | AP/mAP evaluation guide |
| Evaluation metrics | AP50, mAP50:95, and metric definitions |
| Model downloads | Download system, auto-pull |
| GPU safety | VRAM guard, cleanup, emergency recovery |
| Parallel safety | Model concurrency policies, benchmarks |
| Colab GPU worker | Run VisionServeX on a Colab GPU for demos |
| OpenMMLab expert | RTMPose, RTMDet-R, Co-DINO, InternImage |
| Cloudflare Tunnel | Public mode safely |
| GPU validation | CPU/CUDA/MPS status |
| TensorRT | ONNX export and TensorRT roadmap |
| Benchmarks | Latency numbers |
| Troubleshooting | Common errors |
| About | Author, citation |
License and Model Licenses
Apache-2.0. See LICENSE and NOTICE.
Each integrated model retains its own upstream license. Review model, checkpoint, and dataset licenses before commercial use. See docs/model_licenses.md.
Citation
@software{sajjadi2026visionservex,
author = {Arash Sajjadi},
title = {{VisionServeX: A permissive-license-aware framework for local CV model serving}},
year = {2026},
url = {https://github.com/arashsajjadi/VisionServeX},
note = {Developed under the supervision of Prof. Mark Eramian, University of Saskatchewan.}
}
Author: Arash Sajjadi — PhD Candidate, Department of Computer Science, University of Saskatchewan
Supervision: Prof. Mark Eramian, Computer Vision Lab
(This project is not an official product of the University of Saskatchewan.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file visionservex-1.5.0.tar.gz.
File metadata
- Download URL: visionservex-1.5.0.tar.gz
- Upload date:
- Size: 228.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef31fc44e799c3c0f04f1c0c2e5c1d415119239ca6d82534ce907b2409ec57a5
|
|
| MD5 |
cd7d14dbe8aab5f7bf428fcb5b92667e
|
|
| BLAKE2b-256 |
2e2b17d42f44d83202ad17694ee31a2beac0978161960d873785428336705e9d
|
Provenance
The following attestation bundles were made for visionservex-1.5.0.tar.gz:
Publisher:
publish.yml on arashsajjadi/VisionServeX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visionservex-1.5.0.tar.gz -
Subject digest:
ef31fc44e799c3c0f04f1c0c2e5c1d415119239ca6d82534ce907b2409ec57a5 - Sigstore transparency entry: 1552801254
- Sigstore integration time:
-
Permalink:
arashsajjadi/VisionServeX@c7052926edba9b03b3f7efb84322062924dbb501 -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/arashsajjadi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c7052926edba9b03b3f7efb84322062924dbb501 -
Trigger Event:
push
-
Statement type:
File details
Details for the file visionservex-1.5.0-py3-none-any.whl.
File metadata
- Download URL: visionservex-1.5.0-py3-none-any.whl
- Upload date:
- Size: 254.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45a5a48c03d35494a0363182662d3aeab265168fec0c0002c94244b5df315dcd
|
|
| MD5 |
6d6051d279dd2ba9e239dd86147366e2
|
|
| BLAKE2b-256 |
b37eb9c02f27d26278e9f281ad65546d4a7e36b58b23ff2f8791447334653be5
|
Provenance
The following attestation bundles were made for visionservex-1.5.0-py3-none-any.whl:
Publisher:
publish.yml on arashsajjadi/VisionServeX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visionservex-1.5.0-py3-none-any.whl -
Subject digest:
45a5a48c03d35494a0363182662d3aeab265168fec0c0002c94244b5df315dcd - Sigstore transparency entry: 1552801266
- Sigstore integration time:
-
Permalink:
arashsajjadi/VisionServeX@c7052926edba9b03b3f7efb84322062924dbb501 -
Branch / Tag:
refs/tags/v1.5.0 - Owner: https://github.com/arashsajjadi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c7052926edba9b03b3f7efb84322062924dbb501 -
Trigger Event:
push
-
Statement type: