Skip to main content

CLI tool and library to export ML models to production formats and containerize them with Docker

Project description

anydeploy

Deploy ML models anywhere

PyPI Python License

Export ML models to production formats (ONNX, TFLite, TorchScript) and deploy them locally or at the edge.

anydeploy makes model deployment easy. Convert your trained models to optimized inference formats, benchmark performance, validate correctness, generate serving code, and containerize everything -- all from a single CLI or Python API.

Edge-first deployment. Supports ONNX Runtime (CPU/GPU/edge), TFLite (mobile/edge), and llama.cpp (local LLM serving). All deployment targets work completely offline.

Built and maintained by Viet-Anh Nguyen at NRL.ai.

Installation

# Core (CLI + config + benchmarking)
pip install anydeploy

# With specific framework support
pip install anydeploy[torch]      # PyTorch + TorchScript
pip install anydeploy[onnx]       # ONNX + ONNX Runtime
pip install anydeploy[tflite]     # TensorFlow Lite
pip install anydeploy[serve]      # FastAPI serving

# Everything
pip install anydeploy[all]

Quick Start

CLI

# Export a PyTorch model to ONNX
anydeploy export model.pt --format onnx --input-shape 1,3,224,224

# Export to TFLite
anydeploy export model.pt --format tflite --input-shape 1,3,224,224

# Benchmark an exported model
anydeploy benchmark model.onnx --runs 100

# Serve a model with FastAPI
anydeploy serve model.onnx --backend fastapi --port 8000

# Generate a Docker container for deployment
anydeploy dockerize model.onnx --base python:3.11-slim

Python API

import anydeploy

# Export a model
anydeploy.export(model, format="onnx", input_shape=(1, 3, 224, 224))

# Benchmark performance
result = anydeploy.benchmark("model.onnx", runs=100)
print(f"Mean latency: {result.mean_latency_ms:.2f} ms")
print(f"P95 latency:  {result.p95_latency_ms:.2f} ms")
print(f"Throughput:   {result.throughput:.1f} inferences/sec")

# Validate exported model against original
report = anydeploy.validate(original_model, "model.onnx", test_input)
print(f"Max difference: {report.max_diff}")
print(f"Passed: {report.passed}")

# Generate Dockerfile and serving code
from anydeploy.config import DockerConfig
docker_cfg = DockerConfig(base_image="python:3.11-slim")
anydeploy.dockerize("model.onnx", docker_cfg)

# Register a custom exporter
from anydeploy.export.base import BaseExporter
class MyExporter(BaseExporter):
    def export(self, model, output_path, config=None):
        ...
anydeploy.register_exporter("myformat", MyExporter)

Export Format Comparison

Format Framework Hardware Optimization File Size
ONNX Any (via ONNX Runtime) CPU, GPU, Edge Graph optimization Medium
TFLite TensorFlow Mobile, Edge Quantization Small
TorchScript PyTorch CPU, GPU JIT compilation Large

Serving

anydeploy generates production-ready serving code for multiple backends:

# FastAPI server for ONNX/TFLite/TorchScript models
anydeploy serve model.onnx --backend fastapi --port 8000

# llama.cpp server for GGUF language models (edge LLM deployment)
anydeploy serve model.gguf --backend llamacpp --port 8080

FastAPI Backend

Creates a FastAPI application with:

  • /predict endpoint accepting JSON or binary input
  • /health health check endpoint
  • Automatic input validation
  • Configurable batch size

llama.cpp Backend

Creates deployment scripts for serving GGUF language models locally:

  • Shell script to launch llama.cpp server
  • Dockerfile for containerized LLM serving
  • OpenAI-compatible /v1/chat/completions endpoint
  • Works on CPU, GPU, and edge devices

Docker Deployment

Generate a complete Docker setup for your model:

anydeploy dockerize model.onnx --base python:3.11-slim --port 8000

This creates:

  • Dockerfile with optimized layers
  • serve.py FastAPI application
  • requirements.txt with pinned dependencies

Extensibility

anydeploy uses a plugin architecture. You can register custom exporters and serving backends:

import anydeploy
from anydeploy.export.base import BaseExporter

class CoreMLExporter(BaseExporter):
    format_name = "coreml"

    def export(self, model, output_path, config=None):
        # Your export logic
        ...

    def validate_model(self, model):
        return True

anydeploy.register_exporter("coreml", CoreMLExporter)

See CONTRIBUTING.md for details on adding new exporters and backends.

Local-First / Edge AI

This package is designed for edge and local deployment. All export formats (ONNX, TFLite, TorchScript) produce models that run completely offline. The llama.cpp backend enables local LLM serving without any cloud dependencies.

# Export for edge deployment
anydeploy export model.pt --format onnx       # ONNX Runtime (CPU/GPU/edge)
anydeploy export model.pt --format tflite      # TFLite (mobile/edge)

# Serve an LLM locally
anydeploy serve model.gguf --backend llamacpp

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anydeploy-0.2.1.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anydeploy-0.2.1-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file anydeploy-0.2.1.tar.gz.

File metadata

  • Download URL: anydeploy-0.2.1.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.1.tar.gz
Algorithm Hash digest
SHA256 6d0db06b3c3a06d0097d58bc5d1c3fb49d9c9ee698d9ae28d56b4a3907c9b2f2
MD5 4e2eb60bdbbd67eefea80e1ca996a158
BLAKE2b-256 68125feedc04a7fbd743a06822c1dd0593390b1d26cf65197c4a1e8950f552b2

See more details on using hashes here.

File details

Details for the file anydeploy-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: anydeploy-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f47c4471a1e8b4638fe5a8defa72e339347b0bcec2bede551761e977bb1c4e8a
MD5 04f332467f59408fb417484ff6dc3395
BLAKE2b-256 4c88cef81b153f8c56faeaf23747351f3344793f1fbf2cbe170246de600fd0c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page