Skip to main content

CLI tool and library to export ML models to production formats and containerize them with Docker

Project description

anydeploy

Deploy ML models anywhere

PyPI Python License

Export ML models to production formats (ONNX, TFLite, TorchScript) and deploy them locally or at the edge.

anydeploy makes model deployment easy. Convert your trained models to optimized inference formats, benchmark performance, validate correctness, generate serving code, and containerize everything -- all from a single CLI or Python API.

Edge-first deployment. Supports ONNX Runtime (CPU/GPU/edge), TFLite (mobile/edge), and llama.cpp (local LLM serving). All deployment targets work completely offline.

Built and maintained by Viet-Anh Nguyen at NRL.ai.

Installation

# Core (CLI + config + benchmarking)
pip install anydeploy

# With specific framework support
pip install anydeploy[torch]      # PyTorch + TorchScript
pip install anydeploy[onnx]       # ONNX + ONNX Runtime
pip install anydeploy[tflite]     # TensorFlow Lite
pip install anydeploy[serve]      # FastAPI serving

# Everything
pip install anydeploy[all]

Quick Start

CLI

# Export a PyTorch model to ONNX
anydeploy export model.pt --format onnx --input-shape 1,3,224,224

# Export to TFLite
anydeploy export model.pt --format tflite --input-shape 1,3,224,224

# Benchmark an exported model
anydeploy benchmark model.onnx --runs 100

# Serve a model with FastAPI
anydeploy serve model.onnx --backend fastapi --port 8000

# Generate a Docker container for deployment
anydeploy dockerize model.onnx --base python:3.11-slim

Python API

import anydeploy

# Export a model
anydeploy.export(model, format="onnx", input_shape=(1, 3, 224, 224))

# Benchmark performance
result = anydeploy.benchmark("model.onnx", runs=100)
print(f"Mean latency: {result.mean_latency_ms:.2f} ms")
print(f"P95 latency:  {result.p95_latency_ms:.2f} ms")
print(f"Throughput:   {result.throughput:.1f} inferences/sec")

# Validate exported model against original
report = anydeploy.validate(original_model, "model.onnx", test_input)
print(f"Max difference: {report.max_diff}")
print(f"Passed: {report.passed}")

# Generate Dockerfile and serving code
from anydeploy.config import DockerConfig
docker_cfg = DockerConfig(base_image="python:3.11-slim")
anydeploy.dockerize("model.onnx", docker_cfg)

# Register a custom exporter
from anydeploy.export.base import BaseExporter
class MyExporter(BaseExporter):
    def export(self, model, output_path, config=None):
        ...
anydeploy.register_exporter("myformat", MyExporter)

Export Format Comparison

Format Framework Hardware Optimization File Size
ONNX Any (via ONNX Runtime) CPU, GPU, Edge Graph optimization Medium
TFLite TensorFlow Mobile, Edge Quantization Small
TorchScript PyTorch CPU, GPU JIT compilation Large

Serving

anydeploy generates production-ready serving code for multiple backends:

# FastAPI server for ONNX/TFLite/TorchScript models
anydeploy serve model.onnx --backend fastapi --port 8000

# llama.cpp server for GGUF language models (edge LLM deployment)
anydeploy serve model.gguf --backend llamacpp --port 8080

FastAPI Backend

Creates a FastAPI application with:

  • /predict endpoint accepting JSON or binary input
  • /health health check endpoint
  • Automatic input validation
  • Configurable batch size

llama.cpp Backend

Creates deployment scripts for serving GGUF language models locally:

  • Shell script to launch llama.cpp server
  • Dockerfile for containerized LLM serving
  • OpenAI-compatible /v1/chat/completions endpoint
  • Works on CPU, GPU, and edge devices

Docker Deployment

Generate a complete Docker setup for your model:

anydeploy dockerize model.onnx --base python:3.11-slim --port 8000

This creates:

  • Dockerfile with optimized layers
  • serve.py FastAPI application
  • requirements.txt with pinned dependencies

Extensibility

anydeploy uses a plugin architecture. You can register custom exporters and serving backends:

import anydeploy
from anydeploy.export.base import BaseExporter

class CoreMLExporter(BaseExporter):
    format_name = "coreml"

    def export(self, model, output_path, config=None):
        # Your export logic
        ...

    def validate_model(self, model):
        return True

anydeploy.register_exporter("coreml", CoreMLExporter)

See CONTRIBUTING.md for details on adding new exporters and backends.

Local-First / Edge AI

This package is designed for edge and local deployment. All export formats (ONNX, TFLite, TorchScript) produce models that run completely offline. The llama.cpp backend enables local LLM serving without any cloud dependencies.

# Export for edge deployment
anydeploy export model.pt --format onnx       # ONNX Runtime (CPU/GPU/edge)
anydeploy export model.pt --format tflite      # TFLite (mobile/edge)

# Serve an LLM locally
anydeploy serve model.gguf --backend llamacpp

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anydeploy-0.2.2.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anydeploy-0.2.2-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file anydeploy-0.2.2.tar.gz.

File metadata

  • Download URL: anydeploy-0.2.2.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.2.tar.gz
Algorithm Hash digest
SHA256 fd2e4123e9104e881345b17068cf92a482cc67209db96a4fcb4f709b91e69901
MD5 ca96d7f838bbef28d9213d641041ffe0
BLAKE2b-256 1855038efb2cfb814c8089d4b3287d0ca98a3721652cae9e2d4a82179e0961f6

See more details on using hashes here.

File details

Details for the file anydeploy-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: anydeploy-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 03f3e2d9a4f9794912d7be3932c097a95c5e69792812e81b46552c1242dde0bb
MD5 cfd6bec8e5a4b26d342bcbb8ae472cad
BLAKE2b-256 ca026bc8ac53b9e1f4cea215748a2dc28762ccc6decc37989f500645b6bb8046

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page