CLI tool and library to export ML models to production formats and containerize them with Docker

These details have not been verified by PyPI

Project links

Project description

anydeploy

Deploy ML models anywhere

PyPI Python License

Export ML models to production formats (ONNX, TFLite, TorchScript) and deploy them locally or at the edge.

anydeploy makes model deployment easy. Convert your trained models to optimized inference formats, benchmark performance, validate correctness, generate serving code, and containerize everything -- all from a single CLI or Python API.

Edge-first deployment. Supports ONNX Runtime (CPU/GPU/edge), TFLite (mobile/edge), and llama.cpp (local LLM serving). All deployment targets work completely offline.

Built and maintained by Viet-Anh Nguyen at NRL.ai.

Installation

# Core (CLI + config + benchmarking)
pip install anydeploy

# With specific framework support
pip install anydeploy[torch]      # PyTorch + TorchScript
pip install anydeploy[onnx]       # ONNX + ONNX Runtime
pip install anydeploy[tflite]     # TensorFlow Lite
pip install anydeploy[serve]      # FastAPI serving

# Everything
pip install anydeploy[all]

Quick Start

CLI

# Export a PyTorch model to ONNX
anydeploy export model.pt --format onnx --input-shape 1,3,224,224

# Export to TFLite
anydeploy export model.pt --format tflite --input-shape 1,3,224,224

# Benchmark an exported model
anydeploy benchmark model.onnx --runs 100

# Serve a model with FastAPI
anydeploy serve model.onnx --backend fastapi --port 8000

# Generate a Docker container for deployment
anydeploy dockerize model.onnx --base python:3.11-slim

Python API

import anydeploy

# Export a model
anydeploy.export(model, format="onnx", input_shape=(1, 3, 224, 224))

# Benchmark performance
result = anydeploy.benchmark("model.onnx", runs=100)
print(f"Mean latency: {result.mean_latency_ms:.2f} ms")
print(f"P95 latency:  {result.p95_latency_ms:.2f} ms")
print(f"Throughput:   {result.throughput:.1f} inferences/sec")

# Validate exported model against original
report = anydeploy.validate(original_model, "model.onnx", test_input)
print(f"Max difference: {report.max_diff}")
print(f"Passed: {report.passed}")

# Generate Dockerfile and serving code
from anydeploy.config import DockerConfig
docker_cfg = DockerConfig(base_image="python:3.11-slim")
anydeploy.dockerize("model.onnx", docker_cfg)

# Register a custom exporter
from anydeploy.export.base import BaseExporter
class MyExporter(BaseExporter):
    def export(self, model, output_path, config=None):
        ...
anydeploy.register_exporter("myformat", MyExporter)

Export Format Comparison

Format	Framework	Hardware	Optimization	File Size
ONNX	Any (via ONNX Runtime)	CPU, GPU, Edge	Graph optimization	Medium
TFLite	TensorFlow	Mobile, Edge	Quantization	Small
TorchScript	PyTorch	CPU, GPU	JIT compilation	Large

Serving

anydeploy generates production-ready serving code for multiple backends:

# FastAPI server for ONNX/TFLite/TorchScript models
anydeploy serve model.onnx --backend fastapi --port 8000

# llama.cpp server for GGUF language models (edge LLM deployment)
anydeploy serve model.gguf --backend llamacpp --port 8080

FastAPI Backend

Creates a FastAPI application with:

/predict endpoint accepting JSON or binary input
/health health check endpoint
Automatic input validation
Configurable batch size

llama.cpp Backend

Creates deployment scripts for serving GGUF language models locally:

Shell script to launch llama.cpp server
Dockerfile for containerized LLM serving
OpenAI-compatible /v1/chat/completions endpoint
Works on CPU, GPU, and edge devices

Docker Deployment

Generate a complete Docker setup for your model:

anydeploy dockerize model.onnx --base python:3.11-slim --port 8000

This creates:

Dockerfile with optimized layers
serve.py FastAPI application
requirements.txt with pinned dependencies

Extensibility

anydeploy uses a plugin architecture. You can register custom exporters and serving backends:

import anydeploy
from anydeploy.export.base import BaseExporter

class CoreMLExporter(BaseExporter):
    format_name = "coreml"

    def export(self, model, output_path, config=None):
        # Your export logic
        ...

    def validate_model(self, model):
        return True

anydeploy.register_exporter("coreml", CoreMLExporter)

See CONTRIBUTING.md for details on adding new exporters and backends.

Local-First / Edge AI

This package is designed for edge and local deployment. All export formats (ONNX, TFLite, TorchScript) produce models that run completely offline. The llama.cpp backend enables local LLM serving without any cloud dependencies.

# Export for edge deployment
anydeploy export model.pt --format onnx       # ONNX Runtime (CPU/GPU/edge)
anydeploy export model.pt --format tflite      # TFLite (mobile/edge)

# Serve an LLM locally
anydeploy serve model.gguf --backend llamacpp

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

MIT License. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Apr 9, 2026

0.2.3

Apr 9, 2026

0.2.2

Apr 9, 2026

This version

0.2.1

Apr 9, 2026

0.2.0

Apr 9, 2026

0.0.1

Jun 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anydeploy-0.2.1.tar.gz (37.2 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anydeploy-0.2.1-py3-none-any.whl (36.1 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file anydeploy-0.2.1.tar.gz.

File metadata

Download URL: anydeploy-0.2.1.tar.gz
Upload date: Apr 9, 2026
Size: 37.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`6d0db06b3c3a06d0097d58bc5d1c3fb49d9c9ee698d9ae28d56b4a3907c9b2f2`
MD5	`4e2eb60bdbbd67eefea80e1ca996a158`
BLAKE2b-256	`68125feedc04a7fbd743a06822c1dd0593390b1d26cf65197c4a1e8950f552b2`

See more details on using hashes here.

File details

Details for the file anydeploy-0.2.1-py3-none-any.whl.

File metadata

Download URL: anydeploy-0.2.1-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 36.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for anydeploy-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f47c4471a1e8b4638fe5a8defa72e339347b0bcec2bede551761e977bb1c4e8a`
MD5	`04f332467f59408fb417484ff6dc3395`
BLAKE2b-256	`4c88cef81b153f8c56faeaf23747351f3344793f1fbf2cbe170246de600fd0c2`

See more details on using hashes here.

anydeploy 0.2.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

anydeploy

Installation

Quick Start

CLI

Python API

Export Format Comparison

Serving

FastAPI Backend

llama.cpp Backend

Docker Deployment

Extensibility

Local-First / Edge AI

Contributing

License

Links

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes