A permissive-license-aware framework for serving modern computer vision models locally and over Cloudflare Tunnel.
Project description
VisionServeX
Secure, beginner-friendly Python API serving for permissive computer vision models
Local inference · Cloudflare Tunnel · Stable JSON API · LLM-agent-friendly
Note on the CI badge: it turns green after the workflow runs successfully on GitHub for the first time. Until then the badge shows "no status" — this is expected for a new repository.
VisionServeX is a permissive-license-aware Python framework for running modern computer vision models locally, exposing them through a clean HTTP API, and optionally sharing them securely over Cloudflare Tunnel.
- No CUDA expertise required.
visionservex doctortells you what your machine can run. GPU is preferred automatically when available and healthy; broken CUDA runtimes fall back to CPU with a clear warning. - One download command.
visionservex pull rfdetr-nano— weights are cached and verified. - Stable contracts. Every prediction returns the same JSON envelope, whether from CLI, Python, or curl.
- Honest. Registry entries say
wired,partial, orstub. Stubs never silently fake results. - Secure defaults. Binds to
127.0.0.1, requires auth for public mode, SSRF and bomb guards on.
Quickstart (works on CPU, ~5 minutes)
pip install 'visionservex[server,hf,rfdetr]'
visionservex getting-started # personalized guide for your machine
# RF-DETR detection — real, fast
visionservex pull rfdetr-nano
visionservex predict rfdetr-nano examples/images/street.jpg --save outputs/out.jpg
# Grounding DINO — text-prompted detection
visionservex pull grounding-dino-tiny
visionservex predict grounding-dino-tiny examples/images/street.jpg \
--prompt "car,person" --save outputs/gd.jpg
# D-FINE — detection via HF Transformers
visionservex pull dfine-s
visionservex predict dfine-s examples/images/street.jpg --save outputs/dfine.jpg
# Start the API
visionservex serve
curl -F "image=@examples/images/street.jpg" \
-F "model_id=rfdetr-nano" \
http://127.0.0.1:8080/detect | jq
Recommendation engine:
visionservex recommend --task detect --simple
What works today
| Family | Model IDs | Task | Status | Install |
|---|---|---|---|---|
| Mock (built-in) | mock-* |
All tasks | stable | base |
| RF-DETR | rfdetr-nano/small/base/medium/large |
detect | beta | [rfdetr] |
| RF-DETR-Seg | rfdetr-seg-nano/small/medium |
segment | beta | [rfdetr] |
| D-FINE | dfine-n/s/m/l/x |
detect | beta | [hf] |
| Grounding DINO | grounding-dino-tiny/swin-t/swin-b |
open-vocab detect | beta | [hf] |
| SwinV2 | swinv2-tiny/small/base/large |
classify | beta | [hf] |
| SAM v1 | sam-vit-base/large/huge |
foundation segment | beta | [hf] |
| SAM 2 | sam2-hiera-tiny/small/base-plus/large |
foundation segment | beta | [hf] |
| Grounded SAM | grounded-sam |
grounded segment | beta | [hf] |
| OneFormer | oneformer-swin-large/dinat-large/convnext-large |
segment (semantic/instance/panoptic) | beta | [hf] |
Not yet wired
| Family | Why | Alternative |
|---|---|---|
| RTMPose | Requires OpenMMLab toolchain | mock-pose for schema |
| RTMDet-R/R2 (OBB) | Requires OpenMMLab + mmrotate | mock-obb for schema |
| Co-DINO-Inst | Requires heavy OpenMMLab | rfdetr-seg-* for instance seg |
| InternImage | Custom CUDA ops, build required | swinv2-* for classification |
| SEEM | Expert manual install | oneformer-swin-large |
| Grounded-SAM-2 | Needs upstream sam2 package |
grounded-sam (works today) |
| ONNX export | CLI exists; engine-quality varies | Use HF model repos for ONNX |
| TensorRT | Future roadmap | — |
We make no benchmark claims. Pick by task, license, and hardware. See docs/model_zoo.md.
Which model to start with?
| I want | Start with | CPU? |
|---|---|---|
| Fast detection | rfdetr-nano |
yes |
| More accurate detection | dfine-s |
yes (slower) |
| Text-prompted detection | grounding-dino-tiny |
yes (slower) |
| Instance segmentation | rfdetr-seg-nano |
yes |
| SAM-style masking | sam-vit-base or sam2-hiera-tiny |
yes (slow) |
| Text + mask together | grounded-sam |
yes (slow) |
| Image classification | swinv2-tiny |
yes |
| Semantic scene parsing | oneformer-swin-large |
yes (slow) |
| Just testing/CI | mock-detect |
yes (instant) |
| I have no GPU | Any *-nano or *-tiny model |
yes |
| I have NVIDIA GPU | Run visionservex doctor first — GPU is used automatically when available |
— |
Installation
pip install visionservex # base: CLI, registry, mock
pip install 'visionservex[server]' # + FastAPI HTTP server
pip install 'visionservex[hf]' # + D-FINE, GD, SwinV2, SAM, SAM2, OneFormer
pip install 'visionservex[rfdetr]' # + RF-DETR and RF-DETR-Seg
pip install 'visionservex[server,hf,rfdetr]' # full recommended install
For OpenMMLab models (RTMPose, RTMDet-R, Co-DINO):
pip install openmim
mim install mmengine mmcv mmpose mmdet mmrotate
See docs/installation.md for platform-specific notes.
Python API
from visionservex import VisionModel
# Object detection
m = VisionModel("rfdetr-nano")
result = m.predict("image.jpg")
for det in result.detections:
print(det.label, f"{det.score:.2f}", det.box.to_xyxy())
result.save("annotated.jpg")
# D-FINE detection (HF Transformers)
m = VisionModel("dfine-s")
result = m.predict("image.jpg")
# SAM 2 (point prompt)
m = VisionModel("sam2-hiera-tiny")
result = m.predict("image.jpg", points=[[x, y]], point_labels=[1])
# SAM 2 (box prompt)
result = m.predict("image.jpg", boxes=[[x1, y1, x2, y2]])
# OneFormer (choose task)
m = VisionModel("oneformer-swin-large")
result = m.predict("image.jpg", task="semantic") # or "instance", "panoptic"
# Grounding DINO
m = VisionModel("grounding-dino-tiny")
result = m.predict("image.jpg", prompts=["red car", "person walking"])
# Auto-pull on first use
m = VisionModel("dfine-s", auto_pull=True)
result = m.predict("image.jpg")
Stable result fields: kind, model_id, task, device, precision, backend,
latency_ms, model_loaded_from, fallback_reason, warnings.
HTTP API
Stable response envelope:
{
"request_id": "...",
"status": "completed",
"model_id": "dfine-s",
"task": "detect",
"backend": "huggingface_dfine",
"device": "cpu",
"precision": "fp32",
"latency_ms": 187.4,
"results": [{"box": {...}, "score": 0.72, "label": "person", "class_id": 0}],
"warnings": [],
"metadata": {}
}
Error envelope:
{
"request_id": "...",
"error": {
"code": "MODEL_MISSING",
"message": "Model weights for 'dfine-s' are not cached.",
"hint": "Run: visionservex pull dfine-s",
"details": {}
}
}
Key endpoints: GET /health, GET /devices, GET /models,
POST /detect, POST /segment, POST /classify, POST /open-vocab/detect,
POST /grounded-segment, GET /jobs/{id}, GET /metrics.
Full reference: docs/api_reference.md.
Security defaults
| Setting | Default |
|---|---|
| Server bind | 127.0.0.1 only |
| Public mode | disabled (explicit opt-in) |
| Authentication | disabled — enable before exposing |
| Remote URL inputs | disabled (SSRF protection) |
| CORS | disabled |
| Upload limit | 20 MiB |
| Image pixel limit | ~33 MP (decompression-bomb guard) |
| Rate limit | 120 req/min per IP |
| Token redaction | enabled in all logs |
See docs/security.md and SECURITY.md.
Safe Cloudflare Tunnel
export VISIONSERVEX_AUTH__ENABLED=true
export VISIONSERVEX_AUTH__API_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(48))")
visionservex tunnel doctor
visionservex tunnel create visionservex
visionservex tunnel route visionservex api.yourdomain.com
visionservex tunnel config api.yourdomain.com --out tunnel.yaml
visionservex serve &
visionservex tunnel run tunnel.yaml --i-understand-this-is-public
The CLI refuses without auth enabled and the explicit confirmation flag.
The generated config always ends with a catch-all http_status:404 rule.
See docs/cloudflare_tunnel.md.
Documentation
| Beginner quickstart | First prediction in 5 min |
| Device check | GPU/CPU/MPS diagnostics |
| Model zoo | All models, license table, "which model?" |
| Model downloads | Download system, auto-pull |
| Model licenses | Per-model license details |
| Cloudflare Tunnel | Safe public exposure |
| Security | Threat model, all protections |
| HTTP API reference | Endpoints, error codes |
| Python API | VisionModel, result types |
| CLI reference | Every command |
| Troubleshooting | Common errors |
| LLM agent guide | Stable CLI/JSON for agents |
| About | Author, citation, acknowledgment |
License and upstream models
VisionServeX is Apache-2.0 (SPDX-License-Identifier: Apache-2.0).
See LICENSE and NOTICE.
Each integrated model retains its own upstream license. Review the model, checkpoint, and training-data licenses before commercial use. VisionServeX does not provide legal advice. See docs/model_licenses.md.
Citation
@software{sajjadi2026visionservex,
author = {Arash Sajjadi},
title = {{VisionServeX: A permissive-license-aware framework for
local computer vision model serving}},
year = {2026},
url = {https://github.com/arashsajjadi/VisionServeX},
note = {Developed under the supervision of Prof. Mark Eramian,
Department of Computer Science, University of Saskatchewan.}
}
Author: Arash Sajjadi — PhD Candidate, Department of Computer Science,
University of Saskatchewan
Supervision: Prof. Mark Eramian, Computer Vision Lab, University of Saskatchewan
(This project is not an official product of the University of Saskatchewan.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file visionservex-1.0.0rc1.tar.gz.
File metadata
- Download URL: visionservex-1.0.0rc1.tar.gz
- Upload date:
- Size: 127.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
df2662c23dcca09833e7c64a885ad64fbfd8c35d5ec803a6804cae3f86685e02
|
|
| MD5 |
d8ab6a16736c9fb24537cce044b3190c
|
|
| BLAKE2b-256 |
f272ad8b3dfb9162d0a3f563c30b503a694bd71d743f2f94b9bb3f1f63cfc583
|
Provenance
The following attestation bundles were made for visionservex-1.0.0rc1.tar.gz:
Publisher:
publish.yml on arashsajjadi/VisionServeX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visionservex-1.0.0rc1.tar.gz -
Subject digest:
df2662c23dcca09833e7c64a885ad64fbfd8c35d5ec803a6804cae3f86685e02 - Sigstore transparency entry: 1549671662
- Sigstore integration time:
-
Permalink:
arashsajjadi/VisionServeX@02815ba2601f5aa5d51b29d229dc81118e0bc852 -
Branch / Tag:
refs/tags/v1.0.0rc1 - Owner: https://github.com/arashsajjadi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@02815ba2601f5aa5d51b29d229dc81118e0bc852 -
Trigger Event:
push
-
Statement type:
File details
Details for the file visionservex-1.0.0rc1-py3-none-any.whl.
File metadata
- Download URL: visionservex-1.0.0rc1-py3-none-any.whl
- Upload date:
- Size: 161.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7aef94a959fef1718877176a18f68dfc7fe99334ea8697d7f295bddc8b5a54a3
|
|
| MD5 |
1324a3fecaa2542a55f78ce0b3fa8fa5
|
|
| BLAKE2b-256 |
64616cb4defcf6ae0291cb9c7cf4832a21dad5480478cecc6facb264f0fada97
|
Provenance
The following attestation bundles were made for visionservex-1.0.0rc1-py3-none-any.whl:
Publisher:
publish.yml on arashsajjadi/VisionServeX
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visionservex-1.0.0rc1-py3-none-any.whl -
Subject digest:
7aef94a959fef1718877176a18f68dfc7fe99334ea8697d7f295bddc8b5a54a3 - Sigstore transparency entry: 1549671681
- Sigstore integration time:
-
Permalink:
arashsajjadi/VisionServeX@02815ba2601f5aa5d51b29d229dc81118e0bc852 -
Branch / Tag:
refs/tags/v1.0.0rc1 - Owner: https://github.com/arashsajjadi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@02815ba2601f5aa5d51b29d229dc81118e0bc852 -
Trigger Event:
push
-
Statement type: