Universal computer vision model serving library with dynamic model management and PixelFlow integration
Project description
Mozo
Universal computer vision model server with automatic memory management and multi-framework support.
Mozo provides HTTP access to 35+ pre-configured models across 11 model families from Detectron2, HuggingFace Transformers, PaddleOCR, EasyOCR, and other frameworks. Models load on-demand and clean up automatically.
Quick Start
pip install mozo
mozo start
Server starts on http://localhost:8000 with all models available via REST API.
Examples
Object detection:
curl -X POST "http://localhost:8000/predict/detectron2/mask_rcnn_R_50_FPN_3x" \
-F "file=@image.jpg"
Depth estimation:
curl -X POST "http://localhost:8000/predict/depth_anything/small" \
-F "file=@image.jpg" --output depth.png
Vision-language Q&A:
curl -X POST "http://localhost:8000/predict/qwen2.5_vl/7b-instruct?prompt=What%20is%20in%20this%20image" \
-F "file=@image.jpg"
List available models:
curl http://localhost:8000/models
Features
- 35+ Pre-configured Models - 11 model families including Detectron2, HuggingFace Transformers, PaddleOCR, EasyOCR, Florence-2, BLIP VQA, and more
- Automatic Memory Management - Lazy loading, usage tracking, automatic cleanup
- Multi-Framework Support - Unified API across different ML frameworks
- PixelFlow Integration - Detection models return unified format for filtering and annotation
- Thread-Safe - Concurrent request handling with per-model locks
- Production Ready - Multiple workers, configurable timeouts, health checks
Installation
# Basic installation
pip install mozo
# Framework dependencies (install as needed)
pip install transformers torch torchvision
pip install 'git+https://github.com/facebookresearch/detectron2.git'
Available Models
Detectron2 (17 variants)
Object detection, instance segmentation, keypoint detection trained on COCO dataset.
Popular variants:
mask_rcnn_R_50_FPN_3x- Instance segmentationfaster_rcnn_R_50_FPN_3x- Object detectionfaster_rcnn_X_101_32x8d_FPN_3x- High-accuracy detectionkeypoint_rcnn_R_50_FPN_3x- Keypoint detectionretinanet_R_50_FPN_3x- Single-stage detector
Output: JSON with bounding boxes, class names, confidence scores (80 COCO classes)
Depth Anything (3 variants)
Monocular depth estimation.
small- Fastest, lowest memorybase- Balanced performancelarge- Best accuracy
Output: PNG grayscale depth map
Qwen2.5-VL (1 variant)
Vision-language understanding for VQA, captioning, and image analysis.
7b-instruct- 7B parameter model (requires 16GB+ RAM)
Output: JSON with text response
Server
# Start with defaults (0.0.0.0:8000, auto-reload enabled)
mozo start
# Custom port
mozo start --port 8080
# Production mode with multiple workers
mozo start --workers 4
# Check version
mozo version
API Reference
Run Prediction
POST /predict/{family}/{variant}
Content-Type: multipart/form-data
Parameters:
family- Model family (e.g.,detectron2,depth_anything,qwen2.5_vl)variant- Model variant (e.g.,mask_rcnn_R_50_FPN_3x,small,7b-instruct)file- Image fileprompt- Text prompt (VLM models only)
Health Check
GET /
Returns server status and loaded models.
List Models
GET /models
Returns all available model families and variants.
List Loaded Models
GET /models/loaded
Returns currently loaded models with usage information.
Get Model Info
GET /models/{family}/{variant}/info
Returns detailed information about a specific model variant.
Unload Model
POST /models/{family}/{variant}/unload
Manually unload a model to free memory.
Cleanup Inactive Models
POST /models/cleanup?inactive_seconds=600
Unload models inactive for specified duration (default: 600 seconds).
How It Works
Lazy Loading Models load on first request, not at server startup. This keeps startup time instant regardless of available models.
Smart Caching Loaded models stay in memory and are reused across requests. First request is slower (model download + load), subsequent requests are fast.
Usage Tracking Each model access updates a timestamp. Models inactive for 10+ minutes are automatically unloaded.
Thread Safety Per-model locks ensure only one thread loads a given model. Other threads wait and reuse the loaded instance.
Example flow:
# Server starts instantly (no models loaded)
mozo start
# First request loads model
curl -X POST "http://localhost:8000/predict/detectron2/faster_rcnn_R_50_FPN_3x" -F "file=@test.jpg"
# Output: [ModelManager] Loading model: detectron2/faster_rcnn_R_50_FPN_3x...
# Subsequent requests reuse loaded model
curl -X POST "http://localhost:8000/predict/detectron2/faster_rcnn_R_50_FPN_3x" -F "file=@test2.jpg"
# Output: [ModelManager] Model already loaded, reusing existing instance.
# After 10 minutes of inactivity, model auto-unloads
# Output: [ModelManager] Cleanup: Unloaded 1 inactive model(s).
Python SDK
For direct integration in Python applications:
from mozo import ModelManager
import cv2
manager = ModelManager()
model = manager.get_model('detectron2', 'mask_rcnn_R_50_FPN_3x')
image = cv2.imread('image.jpg')
detections = model.predict(image)
# Filter results
high_confidence = detections.filter_by_confidence(0.8)
# Manual memory management
manager.unload_model('detectron2', 'mask_rcnn_R_50_FPN_3x')
manager.cleanup_inactive_models(inactive_seconds=300)
PixelFlow Integration
Detection models return PixelFlow Detections objects - a unified format across all ML frameworks:
# Works the same for Detectron2, YOLO, or custom models
detections = model.predict(image)
# Filter and annotate
import pixelflow as pf
filtered = detections.filter_by_confidence(0.8).filter_by_class_id([0, 2])
annotated = pf.annotate.box(image, filtered)
annotated = pf.annotate.label(annotated, filtered)
# Export
json_output = filtered.to_json()
Learn more: PixelFlow
Configuration
Environment Variables
# Enable MPS fallback for macOS (Apple Silicon)
export PYTORCH_ENABLE_MPS_FALLBACK=1
# Configure HuggingFace cache location
export HF_HOME=~/.cache/huggingface
Memory Management
Models automatically unload after 10 minutes of inactivity. Adjust this:
curl -X POST "http://localhost:8000/models/cleanup?inactive_seconds=300"
Or in Python:
manager.cleanup_inactive_models(inactive_seconds=300)
Extending Mozo
Add new models in 3 steps:
- Create adapter in
mozo/adapters/your_model.py - Register in
mozo/registry.py - Use via HTTP or Python API
See CLAUDE.md for detailed implementation guide.
Architecture
HTTP Request → FastAPI Server → ModelManager → ModelFactory → Adapter → Framework
↓
Thread-safe cache
Usage tracking
Auto cleanup
Components:
- Server - FastAPI REST API
- Manager - Lifecycle management, caching, cleanup
- Factory - Dynamic adapter instantiation
- Registry - Central catalog of models
- Adapters - Framework-specific implementations
Development
# Install in development mode
pip install -e .
# Start server with auto-reload
mozo start
Documentation
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mozo-0.3.0.tar.gz.
File metadata
- Download URL: mozo-0.3.0.tar.gz
- Upload date:
- Size: 45.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3861579bdfc327e141f4c59263e37a5d05ad72471d852c23ccb080793dad6b23
|
|
| MD5 |
a9d78e038202b6d0b3cf786dbf74d05f
|
|
| BLAKE2b-256 |
bc2f635c18202c8fd57f9cc3e9d94439cd3c667416d0db0a918a6e9a7662f452
|
File details
Details for the file mozo-0.3.0-py3-none-any.whl.
File metadata
- Download URL: mozo-0.3.0-py3-none-any.whl
- Upload date:
- Size: 62.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
132d24b7da49992031caf39141f986b22f00f4a1174923c5bb215ae9c57a6c17
|
|
| MD5 |
5734b99564fe3cdf17353fe0fd168147
|
|
| BLAKE2b-256 |
f48def8a3517614da4e285c2e3a66704f12817551dc41ea25fb5b6022f58af2e
|