CLI tool and library to export ML models to production formats and containerize them with Docker
Project description
anydeploy
Deploy ML models anywhere
Export ML models to production formats (ONNX, TFLite, TorchScript) and deploy them locally or at the edge.
anydeploy makes model deployment easy. Convert your trained models to optimized inference formats, benchmark performance, validate correctness, generate serving code, and containerize everything -- all from a single CLI or Python API.
Edge-first deployment. Supports ONNX Runtime (CPU/GPU/edge), TFLite (mobile/edge), and llama.cpp (local LLM serving). All deployment targets work completely offline.
Built and maintained by Viet-Anh Nguyen at NRL.ai.
Installation
# Core (CLI + config + benchmarking)
pip install anydeploy
# With specific framework support
pip install anydeploy[torch] # PyTorch + TorchScript
pip install anydeploy[onnx] # ONNX + ONNX Runtime
pip install anydeploy[tflite] # TensorFlow Lite
pip install anydeploy[serve] # FastAPI serving
# Everything
pip install anydeploy[all]
Quick Start
CLI
# Export a PyTorch model to ONNX
anydeploy export model.pt --format onnx --input-shape 1,3,224,224
# Export to TFLite
anydeploy export model.pt --format tflite --input-shape 1,3,224,224
# Benchmark an exported model
anydeploy benchmark model.onnx --runs 100
# Serve a model with FastAPI
anydeploy serve model.onnx --backend fastapi --port 8000
# Generate a Docker container for deployment
anydeploy dockerize model.onnx --base python:3.11-slim
Python API
import anydeploy
# Export a model
anydeploy.export(model, format="onnx", input_shape=(1, 3, 224, 224))
# Benchmark performance
result = anydeploy.benchmark("model.onnx", runs=100)
print(f"Mean latency: {result.mean_latency_ms:.2f} ms")
print(f"P95 latency: {result.p95_latency_ms:.2f} ms")
print(f"Throughput: {result.throughput:.1f} inferences/sec")
# Validate exported model against original
report = anydeploy.validate(original_model, "model.onnx", test_input)
print(f"Max difference: {report.max_diff}")
print(f"Passed: {report.passed}")
# Generate Dockerfile and serving code
from anydeploy.config import DockerConfig
docker_cfg = DockerConfig(base_image="python:3.11-slim")
anydeploy.dockerize("model.onnx", docker_cfg)
# Register a custom exporter
from anydeploy.export.base import BaseExporter
class MyExporter(BaseExporter):
def export(self, model, output_path, config=None):
...
anydeploy.register_exporter("myformat", MyExporter)
Export Format Comparison
| Format | Framework | Hardware | Optimization | File Size |
|---|---|---|---|---|
| ONNX | Any (via ONNX Runtime) | CPU, GPU, Edge | Graph optimization | Medium |
| TFLite | TensorFlow | Mobile, Edge | Quantization | Small |
| TorchScript | PyTorch | CPU, GPU | JIT compilation | Large |
Serving
anydeploy generates production-ready serving code for multiple backends:
# FastAPI server for ONNX/TFLite/TorchScript models
anydeploy serve model.onnx --backend fastapi --port 8000
# llama.cpp server for GGUF language models (edge LLM deployment)
anydeploy serve model.gguf --backend llamacpp --port 8080
FastAPI Backend
Creates a FastAPI application with:
/predictendpoint accepting JSON or binary input/healthhealth check endpoint- Automatic input validation
- Configurable batch size
llama.cpp Backend
Creates deployment scripts for serving GGUF language models locally:
- Shell script to launch llama.cpp server
- Dockerfile for containerized LLM serving
- OpenAI-compatible
/v1/chat/completionsendpoint - Works on CPU, GPU, and edge devices
Docker Deployment
Generate a complete Docker setup for your model:
anydeploy dockerize model.onnx --base python:3.11-slim --port 8000
This creates:
Dockerfilewith optimized layersserve.pyFastAPI applicationrequirements.txtwith pinned dependencies
Extensibility
anydeploy uses a plugin architecture. You can register custom exporters and serving backends:
import anydeploy
from anydeploy.export.base import BaseExporter
class CoreMLExporter(BaseExporter):
format_name = "coreml"
def export(self, model, output_path, config=None):
# Your export logic
...
def validate_model(self, model):
return True
anydeploy.register_exporter("coreml", CoreMLExporter)
See CONTRIBUTING.md for details on adding new exporters and backends.
Local-First / Edge AI
This package is designed for edge and local deployment. All export formats (ONNX, TFLite, TorchScript) produce models that run completely offline. The llama.cpp backend enables local LLM serving without any cloud dependencies.
# Export for edge deployment
anydeploy export model.pt --format onnx # ONNX Runtime (CPU/GPU/edge)
anydeploy export model.pt --format tflite # TFLite (mobile/edge)
# Serve an LLM locally
anydeploy serve model.gguf --backend llamacpp
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
MIT License. See LICENSE for details.
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anydeploy-0.2.2.tar.gz.
File metadata
- Download URL: anydeploy-0.2.2.tar.gz
- Upload date:
- Size: 37.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd2e4123e9104e881345b17068cf92a482cc67209db96a4fcb4f709b91e69901
|
|
| MD5 |
ca96d7f838bbef28d9213d641041ffe0
|
|
| BLAKE2b-256 |
1855038efb2cfb814c8089d4b3287d0ca98a3721652cae9e2d4a82179e0961f6
|
File details
Details for the file anydeploy-0.2.2-py3-none-any.whl.
File metadata
- Download URL: anydeploy-0.2.2-py3-none-any.whl
- Upload date:
- Size: 36.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03f3e2d9a4f9794912d7be3932c097a95c5e69792812e81b46552c1242dde0bb
|
|
| MD5 |
cfd6bec8e5a4b26d342bcbb8ae472cad
|
|
| BLAKE2b-256 |
ca026bc8ac53b9e1f4cea215748a2dc28762ccc6decc37989f500645b6bb8046
|