Skip to main content

A permissive-license-aware framework for serving modern computer vision models locally and over Cloudflare Tunnel.

Project description

VisionServeX

Accuracy-aware computer vision model gateway — honest, local-first, and privacy-respecting.
Serve modern CV models on your machine. Local-only by default. No data retained.

Apache-2.0 Python 3.10+ CI v1.3.0 ruff


What is VisionServeX?

VisionServeX is an open-source, permissive-license-aware Python framework for running modern computer vision models locally and exposing them through a stable HTTP API. It works as a local model gateway: start it once, call any supported model through one clean API.

Accuracy-aware and scientifically usable:
Every model carries an explicit accuracy taxonomy label: demo_fast, production_recommended, accuracy_grade, experimental_sota, expert_sidecar, external_api, or unavailable_with_reason. The recommender, benchmark tools, and registry are aligned to these labels so you always know what tier you are running. Real AP50/mAP50:95 is computed when you provide an annotated dataset.

Honesty policy:
VisionServeX does not claim to beat Ultralytics globally. The benchmark-competitiveness tool is designed to reveal the honest truth. If YOLO wins, it will say so.

Privacy-first design:

  • Binds to 127.0.0.1 by default — nothing leaves your machine.
  • Images are decoded in memory for inference and never written to disk by default.
  • No data is retained between requests by default.
  • Log redaction removes tokens, base64, and API keys from all output.

⚠️ No end-to-end encryption claimed. VisionServeX cannot provide E2E encryption in the cryptographic sense — the inference server must see plaintext image tensors to run models. We provide local-first processing, no-retention defaults, optional encryption-at-rest for job metadata, and auth for public mode. See docs/privacy.md.


Quickstart (CPU, 5 minutes)

pip install 'visionservex[server,hf,rfdetr]'

visionservex getting-started      # personalized guide
visionservex pull dfine-s-o365-coco   # accuracy-grade detection, CPU-capable
visionservex serve                     # http://127.0.0.1:8080
curl -F "image=@image.jpg" -F "model_id=dfine-s-o365-coco" \
     http://127.0.0.1:8080/detect | jq

For a quick demo (smallest model):

visionservex pull rfdetr-nano          # demo_fast, CPU-capable
visionservex predict rfdetr-nano image.jpg

Python Client

from visionservex import Client, VisionModel

# Direct inference (local, no server needed)
result = VisionModel("dfine-s-o365-coco").predict("image.jpg")   # accuracy_grade
result = VisionModel("rfdetr-nano").predict("image.jpg")          # demo_fast

# Via local gateway
client = Client("http://127.0.0.1:8080")
result = client.detect("dfine-s-o365-coco", "image.jpg")
result = client.grounded_segment("grounded-sam2", "image.jpg", prompt="car, person")
result = client.classify("swinv2-tiny", "image.jpg")

Model Taxonomy

Every model in the registry now carries an explicit model_category label.

Category Meaning Examples
demo_fast Quick demo, small, not for accuracy benchmarks dfine-n, rfdetr-nano, rfdetr-seg-nano, grounding-dino-tiny
production_recommended Solid accuracy, ready for real use rfdetr-small, rfdetr-seg-small, swinv2-tiny, sam-vit-base
accuracy_grade Tracked for AP benchmarks; explicitly wired dfine-s-o365-coco, dfine-m/l/x-o365-coco, rfdetr-medium/large, grounding-dino-swin-b
experimental_sota Claims SOTA but not fully verified in this build deim-s/m, deimv2-s/m, rtdetrv4-s/m/l/x, maskdino-r50-coco
expert_sidecar Requires expert setup (OpenMMLab, custom ops) rtmpose-*, internimage-*, co-dino-*
external_api API-gated upstream; not self-hostable grounding-dino-1.5/1.6
unavailable_with_reason Blocked; honest reason documented rfdetr-seg-large/xlarge/2xlarge
utility Mock / built-in / test helpers mock-detect, mock-classify, …

Key rule: demo_fast models are not used to claim competitiveness with YOLO. Use accuracy_grade variants for AP benchmarks.


What works today

Detection (wired, runnable)

Model ID Category Checkpoint Install
dfine-n / dfine-n-coco demo_fast ustc-community/dfine-nano-coco [hf]
dfine-s-o365-coco accuracy_grade ustc-community/dfine-small-obj2coco [hf]
dfine-m-o365-coco accuracy_grade ustc-community/dfine-medium-obj2coco [hf]
dfine-l-o365-coco accuracy_grade ustc-community/dfine-large-obj2coco-e25 [hf]
dfine-x-o365-coco accuracy_grade ustc-community/dfine-xlarge-obj2coco [hf]
rfdetr-nano demo_fast rfdetr pkg [rfdetr]
rfdetr-small production_recommended rfdetr pkg [rfdetr]
rfdetr-medium accuracy_grade rfdetr pkg [rfdetr]
rfdetr-large accuracy_grade rfdetr pkg [rfdetr]

★ Recommended accuracy entry points: dfine-s-o365-coco (CPU-capable) and rfdetr-small (GPU-preferred).

Segmentation

Family Models Category Install
RF-DETR-Seg rfdetr-seg-nano/small/medium demo_fast / production_recommended / accuracy_grade [rfdetr]
SAM v1 sam-vit-base/large/huge production_recommended / accuracy_grade [hf]
SAM 2 sam2-hiera-tiny/small/base-plus/large production_recommended / accuracy_grade [hf]
Grounded SAM grounded-sam, grounded-sam2 production_recommended [hf]
OneFormer oneformer-swin-large/dinat-large/convnext-large accuracy_grade [hf]

Classification

Family Models Category Install
SwinV2 swinv2-tiny/small production_recommended [hf]
SwinV2 swinv2-base/large accuracy_grade [hf]
InternImage internimage-t/s/b/l/h expert_sidecar OpenMMLab

Open-Vocabulary Detection

Model Category Install
grounding-dino-tiny demo_fast [hf]
grounding-dino-swin-b accuracy_grade [hf]
grounding-dino-1.5/1.6 external_api API token required

Experimental SOTA (stub — not runnable yet)

Family Models Blocker
DEIM deim-s/m, deimv2-s/m No HF/pip path; custom loader + license verification needed
RT-DETRv4 rtdetrv4-s/m/l/x No official release numbering; checkpoint source unclear
MaskDINO maskdino-r50-coco/panoptic detectron2 environment required

Competitiveness Benchmark

# Synthetic mode (latency + detection health, no ground truth needed)
visionservex benchmark benchmark-competitiveness \
  --models dfine-s-o365-coco,rfdetr-small \
  --max-images 20 --device auto

# Real AP mode (AP50/mAP50:95 with YOLO-format annotated dataset)
visionservex benchmark benchmark-competitiveness \
  --models dfine-s-o365-coco,rfdetr-small,ultralytics:yolo11n \
  --dataset yolo:/path/to/coco128 \
  --max-images 100 \
  --out reports/ap_benchmark

# COCO JSON format
visionservex benchmark benchmark-competitiveness \
  --models dfine-s-o365-coco,rfdetr-small \
  --dataset coco-json:/data/coco/images:/data/coco/annotations/instances_val2017.json \
  --max-images 500

Real AP/mAP is computed with COCO-style 101-point interpolated PR curves when --dataset is provided. Results are exported as JSON + CSV. The tool is honest — if YOLO wins, it will say so.

Note: Accuracy-grade models are separate from demo models. Do not judge VisionServeX by dfine-n or rfdetr-nano — use dfine-s-o365-coco or rfdetr-small for AP comparison.

Detection, segmentation, classification, pose, OBB, and open-vocabulary tasks need different metrics. Non-detection benchmark tools return BENCHMARK_NOT_IMPLEMENTED with honest roadmap information (roadmap: v1.4).


Capabilities Report

# What can VisionServeX do on this machine right now?
visionservex capabilities report
visionservex capabilities report --format markdown --out docs/capabilities.md
visionservex capabilities report --json

Covers: devices, installed extras, model counts by task/category, runnable models, unavailable blockers, goal-based recommendations, security status, and known limitations.


Model Cards

# Structured per-model documentation
visionservex model-card show dfine-s-o365-coco
visionservex model-card show dfine-s-o365-coco --format markdown
visionservex model-card list --task detect
visionservex model-card export --out docs/model_cards.md

Every card includes: recommended_for, not_recommended_for, competes_with, hardware requirements, official benchmark note, and VisionServeX benchmark status.


Replacement Map

# Which VisionServeX models replace each Ultralytics/YOLO task?
visionservex replacement-map map --task detect
visionservex replacement-map map --task segment
visionservex replacement-map map --task classify
visionservex replacement-map map --task pose
visionservex replacement-map map --format markdown

Honest and task-specific. Does not claim "better" unless AP evidence exists.


Debug Output

Before declaring a checkpoint weak, run the postprocessing audit:

visionservex debug-output dfine-s-o365-coco image.jpg
visionservex debug-output dfine-s-o365-coco image.jpg --threshold 0.01 --json

Reports: score histogram, label histogram, first 10 boxes, invalid boxes, unmapped labels, preprocessing notes.


Model Recommender

# By goal (v1.2.0)
visionservex recommend --task detect --goal accuracy
visionservex recommend --task detect --goal fastest_demo
visionservex recommend --goal best_segmentation
visionservex recommend --goal best_open_vocab

# By task and hardware
visionservex recommend --task detect --device cpu
visionservex recommend --task detect --device cuda --vram 8

For --goal accuracy --task detect, the recommender surfaces dfine-s/m-o365-coco and rfdetr-small/medium, not nano variants.


Security and Privacy

visionservex security audit --json
visionservex security mode cloudflare_private --apply
visionservex gateway token
visionservex security test-redaction
visionservex privacy inspect-cache
visionservex privacy cleanup --dry-run

Security modes:

Mode Binding Auth Notes
local_private 127.0.0.1 Optional Default, safest
lan_private LAN Required TLS recommended
cloudflare_private 127.0.0.1 + tunnel Required Cloudflare Access recommended
production_multi_user 127.0.0.1 + proxy Required Encrypted job store, audit logs

Safe Cloudflare Tunnel

export VISIONSERVEX_AUTH__ENABLED=true
export VISIONSERVEX_AUTH__API_KEY=$(visionservex gateway token 2>&1 | grep "API key:" | awk '{print $NF}')

visionservex tunnel config --domain api.yourdomain.com --out tunnel.yaml
visionservex serve &
visionservex tunnel run tunnel.yaml --i-understand-this-is-public

GPU Safety

visionservex gpu guard-status
visionservex gpu processes
visionservex gpu cleanup --dry-run
visionservex gpu cleanup --yes

See docs/gpu_safety.md and docs/parallel_safety.md.


Temporary Colab GPU Worker (optional)

Run VisionServeX on a Google Colab GPU as a short-lived remote worker. Good for demos and benchmarks, not for production — Colab sessions can disconnect at any time.

# Inside a Colab notebook:
!pip install -U 'visionservex[server,hf,rfdetr]'
!visionservex colab doctor
!visionservex gateway start --profile colab-gpu-worker

A copy-paste notebook lives at examples/colab/VisionServeX_Colab_GPU_Worker.ipynb. Full guide: docs/colab_gpu_worker.md.


Installation

pip install visionservex                        # base (no heavy deps)
pip install 'visionservex[server]'              # + HTTP API server
pip install 'visionservex[hf]'                  # + HF Transformers (D-FINE, GD, SwinV2, SAM, SAM2, OneFormer)
pip install 'visionservex[rfdetr]'              # + RF-DETR and RF-DETR-Seg
pip install 'visionservex[server,hf,rfdetr]'    # full recommended

OpenMMLab (RTMPose, RTMDet-R, Co-DINO, InternImage): Docker sidecar or pip install openmim && mim install mmengine mmcv mmpose. See docs/openmmlab_expert_models.md.


Known Limitations

  • D-FINE COCO-only variants (dfine-s-coco etc.): Point to HF repos that may not exist yet. Use dfine-s-o365-coco (Objects365+COCO) for guaranteed availability.
  • DEIM / RT-DETRv4: Registered as experimental_sota but not wired. Blockers documented per-model in the registry.
  • AP50/mAP benchmark: The benchmark-competitiveness tool reports latency and detection health only. Full AP evaluation requires ground-truth COCO annotations not bundled with VisionServeX.
  • OpenMMLab (RTMPose, RTMDet-R/R2, Co-DINO, InternImage): Requires the OpenMMLab toolchain and manually-obtained checkpoints. Returns CHECKPOINT_REQUIRED structured error — no fake output.
  • TensorRT: ONNX export works for SwinV2. TensorRT engine build requires trtexec.
  • Apple MPS: Implemented but not maintainer-verified.

GPU: CUDA verified on RTX 5080 for 6+ model families. Run visionservex gpu smoke-test on your hardware.
MPS (Apple Silicon): Implemented, not maintainer-verified. See docs/gpu_validation.md.
VRAM safety: Desktop GPU guard reserves 3 GB for GUI/system. GPU tests run serially by default. See docs/gpu_safety.md.


Syntax Contract

All documented CLI/Python/API examples are covered and verified. No example is allowed to silently fail or return a raw traceback.

visionservex syntax audit             # verify examples, failing must be 0
visionservex validation run release   # run full CI test suite

Documentation

Beginner quickstart 5-minute guide
Local gateway Gateway commands and Python client
Security Threat model, modes, configuration
Privacy No E2E claim, retention policy, encryption
Model zoo All 87 models with current status and taxonomy
Model cards Structured per-model cards with honest benchmark notes
Replacement map Ultralytics/YOLO → VisionServeX replacement guide
Benchmark competitiveness AP/mAP evaluation guide
Evaluation metrics AP50, mAP50:95, and metric definitions
Model downloads Download system, auto-pull
GPU safety VRAM guard, cleanup, emergency recovery
Parallel safety Model concurrency policies, benchmarks
Colab GPU worker Run VisionServeX on a Colab GPU for demos
OpenMMLab expert RTMPose, RTMDet-R, Co-DINO, InternImage
Cloudflare Tunnel Public mode safely
GPU validation CPU/CUDA/MPS status
TensorRT ONNX export and TensorRT roadmap
Benchmarks Latency numbers
Troubleshooting Common errors
About Author, citation

License and Model Licenses

Apache-2.0. See LICENSE and NOTICE.

Each integrated model retains its own upstream license. Review model, checkpoint, and dataset licenses before commercial use. See docs/model_licenses.md.


Citation

@software{sajjadi2026visionservex,
  author = {Arash Sajjadi},
  title  = {{VisionServeX: A permissive-license-aware framework for local CV model serving}},
  year   = {2026},
  url    = {https://github.com/arashsajjadi/VisionServeX},
  note   = {Developed under the supervision of Prof. Mark Eramian, University of Saskatchewan.}
}

Author: Arash Sajjadi — PhD Candidate, Department of Computer Science, University of Saskatchewan
Supervision: Prof. Mark Eramian, Computer Vision Lab
(This project is not an official product of the University of Saskatchewan.)

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visionservex-1.3.0.tar.gz (191.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

visionservex-1.3.0-py3-none-any.whl (223.6 kB view details)

Uploaded Python 3

File details

Details for the file visionservex-1.3.0.tar.gz.

File metadata

  • Download URL: visionservex-1.3.0.tar.gz
  • Upload date:
  • Size: 191.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for visionservex-1.3.0.tar.gz
Algorithm Hash digest
SHA256 8b1ecab29c61503784e76125ea185ba390c0649fe54b0e7cab3f73f243c56e2d
MD5 665e05d650ccb5c707a248c9a36f667e
BLAKE2b-256 f2f7eba4d72c1c1046edd87dc647af9ec3d4869f007825f3f54d0802a82dfcef

See more details on using hashes here.

Provenance

The following attestation bundles were made for visionservex-1.3.0.tar.gz:

Publisher: publish.yml on arashsajjadi/VisionServeX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file visionservex-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: visionservex-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 223.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for visionservex-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 addb8d6f646432173660815fa80359ea5531cd7ae6fd4724f61500645dbc8b10
MD5 f07df7e232aa91b6861459633df652fb
BLAKE2b-256 c9b1bd4809d82e1093f4bf9f70808737d67a0f89b7e600a2647da69132db037a

See more details on using hashes here.

Provenance

The following attestation bundles were made for visionservex-1.3.0-py3-none-any.whl:

Publisher: publish.yml on arashsajjadi/VisionServeX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page