Skip to main content

CLI tool and library to export ML models to production formats and containerize them with Docker

Project description

anydeploy

anydeploy logo

PyPI Python License

Export ML models to production formats (ONNX, TFLite, TorchScript) and deploy them locally or at the edge.

anydeploy makes model deployment easy. Convert your trained models to optimized inference formats, benchmark performance, validate correctness, generate serving code, and containerize everything -- all from a single CLI or Python API.

Edge-first deployment. Supports ONNX Runtime (CPU/GPU/edge), TFLite (mobile/edge), and llama.cpp (local LLM serving). All deployment targets work completely offline.

Built and maintained by Viet-Anh Nguyen at NRL.ai.

Installation

# Core (CLI + config + benchmarking)
pip install anydeploy

# With specific framework support
pip install anydeploy[torch]      # PyTorch + TorchScript
pip install anydeploy[onnx]       # ONNX + ONNX Runtime
pip install anydeploy[tflite]     # TensorFlow Lite
pip install anydeploy[serve]      # FastAPI serving

# Everything
pip install anydeploy[all]

Quick Start

CLI

# Export a PyTorch model to ONNX
anydeploy export model.pt --format onnx --input-shape 1,3,224,224

# Export to TFLite
anydeploy export model.pt --format tflite --input-shape 1,3,224,224

# Benchmark an exported model
anydeploy benchmark model.onnx --runs 100

# Serve a model with FastAPI
anydeploy serve model.onnx --backend fastapi --port 8000

# Generate a Docker container for deployment
anydeploy dockerize model.onnx --base python:3.11-slim

Python API

import anydeploy

# Export a model
anydeploy.export(model, format="onnx", input_shape=(1, 3, 224, 224))

# Benchmark performance
result = anydeploy.benchmark("model.onnx", runs=100)
print(f"Mean latency: {result.mean_latency_ms:.2f} ms")
print(f"P95 latency:  {result.p95_latency_ms:.2f} ms")
print(f"Throughput:   {result.throughput:.1f} inferences/sec")

# Validate exported model against original
report = anydeploy.validate(original_model, "model.onnx", test_input)
print(f"Max difference: {report.max_diff}")
print(f"Passed: {report.passed}")

# Generate Dockerfile and serving code
from anydeploy.config import DockerConfig
docker_cfg = DockerConfig(base_image="python:3.11-slim")
anydeploy.dockerize("model.onnx", docker_cfg)

# Register a custom exporter
from anydeploy.export.base import BaseExporter
class MyExporter(BaseExporter):
    def export(self, model, output_path, config=None):
        ...
anydeploy.register_exporter("myformat", MyExporter)

Export Format Comparison

Format Framework Hardware Optimization File Size
ONNX Any (via ONNX Runtime) CPU, GPU, Edge Graph optimization Medium
TFLite TensorFlow Mobile, Edge Quantization Small
TorchScript PyTorch CPU, GPU JIT compilation Large

Serving

anydeploy generates production-ready serving code for multiple backends:

# FastAPI server for ONNX/TFLite/TorchScript models
anydeploy serve model.onnx --backend fastapi --port 8000

# llama.cpp server for GGUF language models (edge LLM deployment)
anydeploy serve model.gguf --backend llamacpp --port 8080

FastAPI Backend

Creates a FastAPI application with:

  • /predict endpoint accepting JSON or binary input
  • /health health check endpoint
  • Automatic input validation
  • Configurable batch size

llama.cpp Backend

Creates deployment scripts for serving GGUF language models locally:

  • Shell script to launch llama.cpp server
  • Dockerfile for containerized LLM serving
  • OpenAI-compatible /v1/chat/completions endpoint
  • Works on CPU, GPU, and edge devices

Docker Deployment

Generate a complete Docker setup for your model:

anydeploy dockerize model.onnx --base python:3.11-slim --port 8000

This creates:

  • Dockerfile with optimized layers
  • serve.py FastAPI application
  • requirements.txt with pinned dependencies

Extensibility

anydeploy uses a plugin architecture. You can register custom exporters and serving backends:

import anydeploy
from anydeploy.export.base import BaseExporter

class CoreMLExporter(BaseExporter):
    format_name = "coreml"

    def export(self, model, output_path, config=None):
        # Your export logic
        ...

    def validate_model(self, model):
        return True

anydeploy.register_exporter("coreml", CoreMLExporter)

See CONTRIBUTING.md for details on adding new exporters and backends.

Local-First / Edge AI

This package is designed for edge and local deployment. All export formats (ONNX, TFLite, TorchScript) produce models that run completely offline. The llama.cpp backend enables local LLM serving without any cloud dependencies.

# Export for edge deployment
anydeploy export model.pt --format onnx       # ONNX Runtime (CPU/GPU/edge)
anydeploy export model.pt --format tflite      # TFLite (mobile/edge)

# Serve an LLM locally
anydeploy serve model.gguf --backend llamacpp

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anydeploy-0.2.0.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anydeploy-0.2.0-py3-none-any.whl (36.1 kB view details)

Uploaded Python 3

File details

Details for the file anydeploy-0.2.0.tar.gz.

File metadata

  • Download URL: anydeploy-0.2.0.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 40682d7b35661bb28b9956556fa849ecf3eb8c91cdb4abdbf3811fca45496d12
MD5 3643614f15f6b953f2db85d0f278efeb
BLAKE2b-256 9fd5f964aba7d45150619d686ce139ba66c8fa021d1135cc04fc5c6042a16ee4

See more details on using hashes here.

File details

Details for the file anydeploy-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: anydeploy-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 36.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d3753cb6f017f819927c35834ef22f04ef8d699ca6b05926c7b6653cc8abf462
MD5 6150eaa2a56e3b5208c78e36e6cbd2bd
BLAKE2b-256 78e248757d9abf8da2666f1c057371c481bf4e4dc36192a5e64143006117b92b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page