Skip to main content

Universal computer vision model serving library with dynamic model management and PixelFlow integration

Project description

Mozo

Universal computer vision model server with automatic memory management and multi-framework support.

Mozo provides HTTP access to 35+ pre-configured models across 11 model families from Detectron2, HuggingFace Transformers, PaddleOCR, EasyOCR, and other frameworks. Models load on-demand and clean up automatically.

Quick Start

pip install mozo
mozo start

Server starts on http://localhost:8000 with all models available via REST API.

Examples

Object detection:

curl -X POST "http://localhost:8000/predict/detectron2/mask_rcnn_R_50_FPN_3x" \
  -F "file=@image.jpg"

Depth estimation:

curl -X POST "http://localhost:8000/predict/depth_anything/small" \
  -F "file=@image.jpg" --output depth.png

Vision-language Q&A:

curl -X POST "http://localhost:8000/predict/qwen2.5_vl/7b-instruct?prompt=What%20is%20in%20this%20image" \
  -F "file=@image.jpg"

List available models:

curl http://localhost:8000/models

Features

  • 35+ Pre-configured Models - 11 model families including Detectron2, HuggingFace Transformers, PaddleOCR, EasyOCR, Florence-2, BLIP VQA, SAM3, and more
  • Automatic Memory Management - Lazy loading, usage tracking, automatic cleanup
  • Multi-Framework Support - Unified API across different ML frameworks
  • PixelFlow Integration - Detection models return unified format for filtering and annotation
  • Thread-Safe - Concurrent request handling with per-model locks
  • Production Ready - Multiple workers, configurable timeouts, health checks

Installation

# Basic installation
pip install mozo

# Framework dependencies (install as needed)
pip install transformers torch torchvision
pip install 'git+https://github.com/facebookresearch/detectron2.git'

Available Models

Detectron2 (17 variants)

Object detection, instance segmentation, keypoint detection trained on COCO dataset.

Popular variants:

  • mask_rcnn_R_50_FPN_3x - Instance segmentation
  • faster_rcnn_R_50_FPN_3x - Object detection
  • faster_rcnn_X_101_32x8d_FPN_3x - High-accuracy detection
  • keypoint_rcnn_R_50_FPN_3x - Keypoint detection
  • retinanet_R_50_FPN_3x - Single-stage detector

Output: JSON with bounding boxes, class names, confidence scores (80 COCO classes)

Depth Anything (3 variants)

Monocular depth estimation.

  • small - Fastest, lowest memory
  • base - Balanced performance
  • large - Best accuracy

Output: PNG grayscale depth map

Qwen2.5-VL (1 variant)

Vision-language understanding for VQA, captioning, and image analysis.

  • 7b-instruct - 7B parameter model (requires 16GB+ RAM)

Output: JSON with text response

Server

# Start with defaults (0.0.0.0:8000, auto-reload enabled)
mozo start

# Custom port
mozo start --port 8080

# Production mode with multiple workers
mozo start --workers 4

# Check version
mozo version

API Reference

Run Prediction

POST /predict/{family}/{variant}
Content-Type: multipart/form-data

Parameters:

  • family - Model family (e.g., detectron2, depth_anything, qwen2.5_vl)
  • variant - Model variant (e.g., mask_rcnn_R_50_FPN_3x, small, 7b-instruct)
  • file - Image file
  • prompt - Text prompt (VLM models only)

Health Check

GET /

Returns server status and loaded models.

List Models

GET /models

Returns all available model families and variants.

List Loaded Models

GET /models/loaded

Returns currently loaded models with usage information.

Get Model Info

GET /models/{family}/{variant}/info

Returns detailed information about a specific model variant.

Unload Model

POST /models/{family}/{variant}/unload

Manually unload a model to free memory.

Cleanup Inactive Models

POST /models/cleanup?inactive_seconds=600

Unload models inactive for specified duration (default: 600 seconds).

How It Works

Lazy Loading Models load on first request, not at server startup. This keeps startup time instant regardless of available models.

Smart Caching Loaded models stay in memory and are reused across requests. First request is slower (model download + load), subsequent requests are fast.

Usage Tracking Each model access updates a timestamp. Models inactive for 10+ minutes are automatically unloaded.

Thread Safety Per-model locks ensure only one thread loads a given model. Other threads wait and reuse the loaded instance.

Example flow:

# Server starts instantly (no models loaded)
mozo start

# First request loads model
curl -X POST "http://localhost:8000/predict/detectron2/faster_rcnn_R_50_FPN_3x" -F "file=@test.jpg"
# Output: [ModelManager] Loading model: detectron2/faster_rcnn_R_50_FPN_3x...

# Subsequent requests reuse loaded model
curl -X POST "http://localhost:8000/predict/detectron2/faster_rcnn_R_50_FPN_3x" -F "file=@test2.jpg"
# Output: [ModelManager] Model already loaded, reusing existing instance.

# After 10 minutes of inactivity, model auto-unloads
# Output: [ModelManager] Cleanup: Unloaded 1 inactive model(s).

Python SDK

For direct integration in Python applications:

from mozo import ModelManager
import cv2

manager = ModelManager()
model = manager.get_model('detectron2', 'mask_rcnn_R_50_FPN_3x')

image = cv2.imread('image.jpg')
detections = model.predict(image)

# Filter results
high_confidence = detections.filter_by_confidence(0.8)

# Manual memory management
manager.unload_model('detectron2', 'mask_rcnn_R_50_FPN_3x')
manager.cleanup_inactive_models(inactive_seconds=300)

PixelFlow Integration

Detection models return PixelFlow Detections objects - a unified format across all ML frameworks:

# Works the same for Detectron2, YOLO, or custom models
detections = model.predict(image)

# Filter and annotate
import pixelflow as pf
filtered = detections.filter_by_confidence(0.8).filter_by_class_id([0, 2])
annotated = pf.annotate.box(image, filtered)
annotated = pf.annotate.label(annotated, filtered)

# Export
json_output = filtered.to_json()

Learn more: PixelFlow

Configuration

Environment Variables

# Enable MPS fallback for macOS (Apple Silicon)
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Configure HuggingFace cache location
export HF_HOME=~/.cache/huggingface

Memory Management

Models automatically unload after 10 minutes of inactivity. Adjust this:

curl -X POST "http://localhost:8000/models/cleanup?inactive_seconds=300"

Or in Python:

manager.cleanup_inactive_models(inactive_seconds=300)

Extending Mozo

Add new models in 3 steps:

  1. Create adapter in mozo/adapters/your_model.py
  2. Register in mozo/registry.py
  3. Use via HTTP or Python API

See CLAUDE.md for detailed implementation guide.

Architecture

HTTP Request → FastAPI Server → ModelManager → ModelFactory → Adapter → Framework
                                      ↓
                               Thread-safe cache
                               Usage tracking
                               Auto cleanup

Components:

  • Server - FastAPI REST API
  • Manager - Lifecycle management, caching, cleanup
  • Factory - Dynamic adapter instantiation
  • Registry - Central catalog of models
  • Adapters - Framework-specific implementations

Development

# Install in development mode
pip install -e .

# Start server with auto-reload
mozo start

Documentation

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mozo-0.4.0.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mozo-0.4.0-py3-none-any.whl (54.0 kB view details)

Uploaded Python 3

File details

Details for the file mozo-0.4.0.tar.gz.

File metadata

  • Download URL: mozo-0.4.0.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mozo-0.4.0.tar.gz
Algorithm Hash digest
SHA256 f07a8c29fe243c5f0e5995e6496154fada1cfaad8a9f49f920ac11c3585bd3a3
MD5 15d99ec25b1fca03aef90423b686232e
BLAKE2b-256 8fe926b2e7dc80fa29bfe566cb16bd9ee1bf1bd0fe32a25ddfa2e234a15fbfe2

See more details on using hashes here.

File details

Details for the file mozo-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: mozo-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 54.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mozo-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e9a7f0776e2af5e5940abb659484521a9f0ba916610459a0c6dbb690d3ed604c
MD5 237f542d97af4eaeadcb71757d3379af
BLAKE2b-256 1c37774fefdd7347550f12a09454307be0bcf0fdfef895cf38b7d3812fd3c41f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page