Deployment tooling for aiSSEMBLE Inference - generates Docker, Kubernetes, and KServe configs
Project description
aiSSEMBLE Inference Deploy
Deployment tooling for aiSSEMBLE Inference - generates deployment configurations for OIP-compatible models.
Overview
aissemble-inference-deploy provides CLI tooling to generate deployment configurations for any OIP-compatible model. Users run a command, get version-controlled configs in their project, and can re-run to update while preserving customizations.
Key Value: Not just "possible to deploy" but "easy to deploy" - enterprise-ready, repeatable, version-controlled.
Extensibility: Generators are discovered via Python entry points, allowing custom deployment targets (OpenShift, AWS SageMaker, air-gapped registries) to be added as separate packages.
Installation
pip install aissemble-inference-deploy
Or with uv:
uv add aissemble-inference-deploy
Quick Start
Navigate to your project directory (containing a models/ directory with your model configurations), then:
# Generate local deployment scripts
inference deploy init --target local
# Start MLServer locally
cd deploy/local && ./run-mlserver.sh
Or for containerized deployment:
# Generate Docker deployment configs
inference deploy init --target docker
# Build and run with Docker Compose
cd deploy/docker && docker-compose up --build
Or for Kubernetes:
# Generate Kubernetes manifests (uses Docker image from above)
inference deploy init --target docker --target kubernetes
# Build Docker image, then deploy to K8s
docker build -t my-app:latest -f deploy/docker/Dockerfile .
kubectl apply -k deploy/kubernetes/base
Or for KServe (serverless ML with scale-to-zero):
# Generate KServe manifests (uses Docker image from above)
inference deploy init --target kserve
# Build and push Docker image, then deploy to KServe
docker build -t my-registry/my-app:v1.0.0 -f deploy/docker/Dockerfile .
docker push my-registry/my-app:v1.0.0
kubectl apply -f deploy/kserve/serving-runtime.yaml
kubectl apply -f deploy/kserve/inference-service.yaml
CLI Reference
inference deploy init
Initialize deployment configurations for your models.
inference deploy init [OPTIONS]
Options:
--target, -t- Deployment target(s) to generate (default: local). Can be specified multiple times.--model-dir, -m- Path to models directory (default: ./models)--output-dir, -o- Output directory for generated configs (default: ./deploy)--project-dir, -p- Project root directory (default: current directory)
Examples:
# Generate local deployment only
inference deploy init --target local
# Generate Docker deployment
inference deploy init --target docker
# Generate Kubernetes manifests
inference deploy init --target kubernetes
# Generate KServe manifests (serverless ML)
inference deploy init --target kserve
# Generate multiple targets
inference deploy init --target local --target docker --target kubernetes --target kserve
# Generate for all available targets
inference deploy init --target all
inference deploy list-targets
List available deployment targets. Generators are discovered via entry points.
inference deploy list-targets
Built-in Generators
| Target | Description | Status |
|---|---|---|
local |
Local MLServer scripts for development | Available |
docker |
Containerized deployment with Docker Compose | Available |
kubernetes |
Standard K8s Deployment + Service with Kustomize | Available |
kserve |
KServe InferenceService with scale-to-zero | Available |
Generated Output Structure
After running inference deploy init, your project will have:
your-project/
models/
your-model/
model-settings.json
deploy/
.inference-deploy.yaml # Tracks generation metadata
local/
run-mlserver.sh # Start MLServer locally
README.md # Local deployment instructions
docker/
Dockerfile # Multi-stage build for MLServer
docker-compose.yml # Local container testing
.dockerignore # Build context exclusions
README.md # Docker deployment instructions
kubernetes/
base/
deployment.yaml # K8s Deployment with health checks
service.yaml # ClusterIP Service
kustomization.yaml # Kustomize base config
overlays/
dev/
kustomization.yaml # Dev overlay (1 replica, lower resources)
prod/
kustomization.yaml # Prod overlay (2 replicas, higher resources)
README.md # Kubernetes deployment instructions
kserve/
serving-runtime.yaml # KServe ServingRuntime (shared runtime config)
inference-service.yaml # KServe InferenceService with scale-to-zero
README.md # KServe deployment instructions
Note: The Kubernetes and KServe generators use the Docker image built by the Docker generator. This keeps things DRY - the Dockerfile is defined once and reused across Docker Compose, Kubernetes, and KServe deployments.
Configuration Tracking
The .inference-deploy.yaml file tracks:
- Generator version used
- When configs were generated
- Which targets were generated
- Checksums of generated files (for future update/merge functionality)
Creating Custom Generators
Custom generators can be added via the inference.generators entry point. This is useful for:
- Air-gapped environments with internal registries
- Platform-specific deployments (OpenShift, AWS SageMaker, etc.)
- Organization-specific deployment patterns
Step 1: Create Your Generator
# my_org_deploy/openshift.py
from aissemble_inference_deploy import Generator, ModelInfo
from pathlib import Path
class OpenShiftGenerator(Generator):
"""Generator for OpenShift deployments."""
name = "openshift"
def generate(self, models: list[ModelInfo] | None = None) -> list[Path]:
if models is None:
models = self.detect_models()
generated_files = []
target_dir = self.output_dir / "openshift"
# Generate OpenShift-specific configs
content = self.render_template(
"openshift/deployment-config.yaml.j2",
{"models": models, "registry": "my-internal-registry.example.com"}
)
path = self.write_file(target_dir / "deployment-config.yaml", content)
generated_files.append(path)
return generated_files
Step 2: Register via Entry Point
# pyproject.toml
[project.entry-points."inference.generators"]
openshift = "my_org_deploy.openshift:OpenShiftGenerator"
Step 3: Install and Use
pip install my-org-deploy
inference deploy list-targets # Shows 'openshift' alongside built-in targets
inference deploy init --target openshift
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aissemble_inference_deploy-1.5.0.tar.gz.
File metadata
- Download URL: aissemble_inference_deploy-1.5.0.tar.gz
- Upload date:
- Size: 42.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c4cdb87d069d4a63d4101a8ba4cfd7cbdcd0c6f8398250dfa28bcf6d0133674f
|
|
| MD5 |
ed559dddec8f90d656b176b2672538fc
|
|
| BLAKE2b-256 |
af7f1e39f98e92ca8b900e0d0540e897cfb9e1a56036261b3909abfa48d0c81c
|
File details
Details for the file aissemble_inference_deploy-1.5.0-py3-none-any.whl.
File metadata
- Download URL: aissemble_inference_deploy-1.5.0-py3-none-any.whl
- Upload date:
- Size: 40.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
371f5baccb416477e625a2906ff71f699136b44a66a3d050fb5a78336da20eca
|
|
| MD5 |
54a014eeb218ddff4be6dd914157bfa8
|
|
| BLAKE2b-256 |
2d173f05e49426d4e533af2ef526523bd8d03ed6bed4b58cb712514c6b1d07c3
|