Universal computer vision model serving library with dynamic model management and PixelFlow integration

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

Mozo

Universal computer vision model server with automatic memory management and multi-framework support.

Mozo provides HTTP access to 35+ pre-configured models across 11 model families from Detectron2, HuggingFace Transformers, PaddleOCR, EasyOCR, and other frameworks. Models load on-demand and clean up automatically.

Quick Start

pip install mozo
mozo start

Server starts on http://localhost:8000 with all models available via REST API.

Examples

Object detection:

curl -X POST "http://localhost:8000/predict/detectron2/mask_rcnn_R_50_FPN_3x" \
  -F "file=@image.jpg"

Depth estimation:

curl -X POST "http://localhost:8000/predict/depth_anything/small" \
  -F "file=@image.jpg" --output depth.png

Vision-language Q&A:

curl -X POST "http://localhost:8000/predict/qwen2.5_vl/7b-instruct?prompt=What%20is%20in%20this%20image" \
  -F "file=@image.jpg"

List available models:

curl http://localhost:8000/models

Features

35+ Pre-configured Models - 11 model families including Detectron2, HuggingFace Transformers, PaddleOCR, EasyOCR, Florence-2, BLIP VQA, SAM3, and more
Automatic Memory Management - Lazy loading, usage tracking, automatic cleanup
Multi-Framework Support - Unified API across different ML frameworks
PixelFlow Integration - Detection models return unified format for filtering and annotation
Thread-Safe - Concurrent request handling with per-model locks
Production Ready - Multiple workers, configurable timeouts, health checks

Installation

# Basic installation
pip install mozo

# Framework dependencies (install as needed)
pip install transformers torch torchvision
pip install 'git+https://github.com/facebookresearch/detectron2.git'

Available Models

Detectron2 (17 variants)

Object detection, instance segmentation, keypoint detection trained on COCO dataset.

Popular variants:

mask_rcnn_R_50_FPN_3x - Instance segmentation
faster_rcnn_R_50_FPN_3x - Object detection
faster_rcnn_X_101_32x8d_FPN_3x - High-accuracy detection
keypoint_rcnn_R_50_FPN_3x - Keypoint detection
retinanet_R_50_FPN_3x - Single-stage detector

Output: JSON with bounding boxes, class names, confidence scores (80 COCO classes)

Depth Anything (3 variants)

Monocular depth estimation.

small - Fastest, lowest memory
base - Balanced performance
large - Best accuracy

Output: PNG grayscale depth map

Qwen2.5-VL (1 variant)

Vision-language understanding for VQA, captioning, and image analysis.

7b-instruct - 7B parameter model (requires 16GB+ RAM)

Output: JSON with text response

Server

# Start with defaults (0.0.0.0:8000, auto-reload enabled)
mozo start

# Custom port
mozo start --port 8080

# Production mode with multiple workers
mozo start --workers 4

# Check version
mozo version

API Reference

Run Prediction

POST /predict/{family}/{variant}
Content-Type: multipart/form-data

Parameters:

family - Model family (e.g., detectron2, depth_anything, qwen2.5_vl)
variant - Model variant (e.g., mask_rcnn_R_50_FPN_3x, small, 7b-instruct)
file - Image file
prompt - Text prompt (VLM models only)

Health Check

GET /

Returns server status and loaded models.

List Models

GET /models

Returns all available model families and variants.

List Loaded Models

GET /models/loaded

Returns currently loaded models with usage information.

Get Model Info

GET /models/{family}/{variant}/info

Returns detailed information about a specific model variant.

Unload Model

POST /models/{family}/{variant}/unload

Manually unload a model to free memory.

Cleanup Inactive Models

POST /models/cleanup?inactive_seconds=600

Unload models inactive for specified duration (default: 600 seconds).

How It Works

Lazy Loading Models load on first request, not at server startup. This keeps startup time instant regardless of available models.

Smart Caching Loaded models stay in memory and are reused across requests. First request is slower (model download + load), subsequent requests are fast.

Usage Tracking Each model access updates a timestamp. Models inactive for 10+ minutes are automatically unloaded.

Thread Safety Per-model locks ensure only one thread loads a given model. Other threads wait and reuse the loaded instance.

Example flow:

# Server starts instantly (no models loaded)
mozo start

# First request loads model
curl -X POST "http://localhost:8000/predict/detectron2/faster_rcnn_R_50_FPN_3x" -F "file=@test.jpg"
# Output: [ModelManager] Loading model: detectron2/faster_rcnn_R_50_FPN_3x...

# Subsequent requests reuse loaded model
curl -X POST "http://localhost:8000/predict/detectron2/faster_rcnn_R_50_FPN_3x" -F "file=@test2.jpg"
# Output: [ModelManager] Model already loaded, reusing existing instance.

# After 10 minutes of inactivity, model auto-unloads
# Output: [ModelManager] Cleanup: Unloaded 1 inactive model(s).

Python SDK

For direct integration in Python applications:

from mozo import ModelManager
import cv2

manager = ModelManager()
model = manager.get_model('detectron2', 'mask_rcnn_R_50_FPN_3x')

image = cv2.imread('image.jpg')
detections = model.predict(image)

# Filter results
high_confidence = detections.filter_by_confidence(0.8)

# Manual memory management
manager.unload_model('detectron2', 'mask_rcnn_R_50_FPN_3x')
manager.cleanup_inactive_models(inactive_seconds=300)

PixelFlow Integration

Detection models return PixelFlow Detections objects - a unified format across all ML frameworks:

# Works the same for Detectron2, YOLO, or custom models
detections = model.predict(image)

# Filter and annotate
import pixelflow as pf
filtered = detections.filter_by_confidence(0.8).filter_by_class_id([0, 2])
annotated = pf.annotate.box(image, filtered)
annotated = pf.annotate.label(annotated, filtered)

# Export
json_output = filtered.to_json()

Learn more: PixelFlow

Configuration

Environment Variables

# Enable MPS fallback for macOS (Apple Silicon)
export PYTORCH_ENABLE_MPS_FALLBACK=1

# Configure HuggingFace cache location
export HF_HOME=~/.cache/huggingface

Memory Management

Models automatically unload after 10 minutes of inactivity. Adjust this:

curl -X POST "http://localhost:8000/models/cleanup?inactive_seconds=300"

Or in Python:

manager.cleanup_inactive_models(inactive_seconds=300)

Extending Mozo

Add new models in 3 steps:

Create adapter in mozo/adapters/your_model.py
Register in mozo/registry.py
Use via HTTP or Python API

See CLAUDE.md for detailed implementation guide.

Architecture

HTTP Request → FastAPI Server → ModelManager → ModelFactory → Adapter → Framework
                                      ↓
                               Thread-safe cache
                               Usage tracking
                               Auto cleanup

Components:

Server - FastAPI REST API
Manager - Lifecycle management, caching, cleanup
Factory - Dynamic adapter instantiation
Registry - Central catalog of models
Adapters - Framework-specific implementations

Development

# Install in development mode
pip install -e .

# Start server with auto-reload
mozo start

Documentation

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.4.0

Mar 31, 2026

0.3.0

Oct 26, 2025

0.2.0

Oct 14, 2025

0.1.0

Sep 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mozo-0.4.0.tar.gz (41.2 kB view details)

Uploaded Mar 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mozo-0.4.0-py3-none-any.whl (54.0 kB view details)

Uploaded Mar 31, 2026 Python 3

File details

Details for the file mozo-0.4.0.tar.gz.

File metadata

Download URL: mozo-0.4.0.tar.gz
Upload date: Mar 31, 2026
Size: 41.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mozo-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`f07a8c29fe243c5f0e5995e6496154fada1cfaad8a9f49f920ac11c3585bd3a3`
MD5	`15d99ec25b1fca03aef90423b686232e`
BLAKE2b-256	`8fe926b2e7dc80fa29bfe566cb16bd9ee1bf1bd0fe32a25ddfa2e234a15fbfe2`

See more details on using hashes here.

File details

Details for the file mozo-0.4.0-py3-none-any.whl.

File metadata

Download URL: mozo-0.4.0-py3-none-any.whl
Upload date: Mar 31, 2026
Size: 54.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mozo-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e9a7f0776e2af5e5940abb659484521a9f0ba916610459a0c6dbb690d3ed604c`
MD5	`237f542d97af4eaeadcb71757d3379af`
BLAKE2b-256	`1c37774fefdd7347550f12a09454307be0bcf0fdfef895cf38b7d3812fd3c41f`

See more details on using hashes here.

mozo 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Mozo

Quick Start

Examples

Features

Installation

Available Models

Detectron2 (17 variants)

Depth Anything (3 variants)

Qwen2.5-VL (1 variant)

Server

API Reference

Run Prediction

Health Check

List Models

List Loaded Models

Get Model Info

Unload Model

Cleanup Inactive Models

How It Works

Python SDK

PixelFlow Integration

Configuration

Environment Variables

Memory Management

Extending Mozo

Architecture

Development

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes