Skip to main content

Deployment tooling for aiSSEMBLE Inference - generates Docker, Kubernetes, and KServe configs

Project description

aiSSEMBLE Inference Deploy

Deployment tooling for aiSSEMBLE Inference - generates deployment configurations for OIP-compatible models.

Overview

aissemble-inference-deploy provides CLI tooling to generate deployment configurations for any OIP-compatible model. Users run a command, get version-controlled configs in their project, and can re-run to update while preserving customizations.

Key Value: Not just "possible to deploy" but "easy to deploy" - enterprise-ready, repeatable, version-controlled.

Extensibility: Generators are discovered via Python entry points, allowing custom deployment targets (OpenShift, AWS SageMaker, air-gapped registries) to be added as separate packages.

Installation

pip install aissemble-inference-deploy

Or with uv:

uv add aissemble-inference-deploy

Quick Start

Navigate to your project directory (containing a models/ directory with your model configurations), then:

# Generate local deployment scripts
inference deploy init --target local

# Start MLServer locally
cd deploy/local && ./run-mlserver.sh

Or for containerized deployment:

# Generate Docker deployment configs
inference deploy init --target docker

# Build and run with Docker Compose
cd deploy/docker && docker-compose up --build

Or for Kubernetes:

# Generate Kubernetes manifests (uses Docker image from above)
inference deploy init --target docker --target kubernetes

# Build Docker image, then deploy to K8s
docker build -t my-app:latest -f deploy/docker/Dockerfile .
kubectl apply -k deploy/kubernetes/base

Or for KServe (serverless ML with scale-to-zero):

# Generate KServe manifests (uses Docker image from above)
inference deploy init --target kserve

# Build and push Docker image, then deploy to KServe
docker build -t my-registry/my-app:v1.0.0 -f deploy/docker/Dockerfile .
docker push my-registry/my-app:v1.0.0
kubectl apply -f deploy/kserve/serving-runtime.yaml
kubectl apply -f deploy/kserve/inference-service.yaml

CLI Reference

inference deploy init

Initialize deployment configurations for your models.

inference deploy init [OPTIONS]

Options:

  • --target, -t - Deployment target(s) to generate (default: local). Can be specified multiple times.
  • --model-dir, -m - Path to models directory (default: ./models)
  • --output-dir, -o - Output directory for generated configs (default: ./deploy)
  • --project-dir, -p - Project root directory (default: current directory)

Examples:

# Generate local deployment only
inference deploy init --target local

# Generate Docker deployment
inference deploy init --target docker

# Generate Kubernetes manifests
inference deploy init --target kubernetes

# Generate KServe manifests (serverless ML)
inference deploy init --target kserve

# Generate multiple targets
inference deploy init --target local --target docker --target kubernetes --target kserve

# Generate for all available targets
inference deploy init --target all

inference deploy list-targets

List available deployment targets. Generators are discovered via entry points.

inference deploy list-targets

Built-in Generators

Target Description Status
local Local MLServer scripts for development Available
docker Containerized deployment with Docker Compose Available
kubernetes Standard K8s Deployment + Service with Kustomize Available
kserve KServe InferenceService with scale-to-zero Available

Generated Output Structure

After running inference deploy init, your project will have:

your-project/
  models/
    your-model/
      model-settings.json
  deploy/
    .inference-deploy.yaml          # Tracks generation metadata
    local/
      run-mlserver.sh         # Start MLServer locally
      README.md               # Local deployment instructions
    docker/
      Dockerfile              # Multi-stage build for MLServer
      docker-compose.yml      # Local container testing
      .dockerignore           # Build context exclusions
      README.md               # Docker deployment instructions
    kubernetes/
      base/
        deployment.yaml       # K8s Deployment with health checks
        service.yaml          # ClusterIP Service
        kustomization.yaml    # Kustomize base config
      overlays/
        dev/
          kustomization.yaml  # Dev overlay (1 replica, lower resources)
        prod/
          kustomization.yaml  # Prod overlay (2 replicas, higher resources)
      README.md               # Kubernetes deployment instructions
    kserve/
      serving-runtime.yaml    # KServe ServingRuntime (shared runtime config)
      inference-service.yaml  # KServe InferenceService with scale-to-zero
      README.md               # KServe deployment instructions

Note: The Kubernetes and KServe generators use the Docker image built by the Docker generator. This keeps things DRY - the Dockerfile is defined once and reused across Docker Compose, Kubernetes, and KServe deployments.

Configuration Tracking

The .inference-deploy.yaml file tracks:

  • Generator version used
  • When configs were generated
  • Which targets were generated
  • Checksums of generated files (for future update/merge functionality)

Creating Custom Generators

Custom generators can be added via the inference.generators entry point. This is useful for:

  • Air-gapped environments with internal registries
  • Platform-specific deployments (OpenShift, AWS SageMaker, etc.)
  • Organization-specific deployment patterns

Step 1: Create Your Generator

# my_org_deploy/openshift.py
from aissemble_inference_deploy import Generator, ModelInfo
from pathlib import Path

class OpenShiftGenerator(Generator):
    """Generator for OpenShift deployments."""

    name = "openshift"

    def generate(self, models: list[ModelInfo] | None = None) -> list[Path]:
        if models is None:
            models = self.detect_models()

        generated_files = []
        target_dir = self.output_dir / "openshift"

        # Generate OpenShift-specific configs
        content = self.render_template(
            "openshift/deployment-config.yaml.j2",
            {"models": models, "registry": "my-internal-registry.example.com"}
        )
        path = self.write_file(target_dir / "deployment-config.yaml", content)
        generated_files.append(path)

        return generated_files

Step 2: Register via Entry Point

# pyproject.toml
[project.entry-points."inference.generators"]
openshift = "my_org_deploy.openshift:OpenShiftGenerator"

Step 3: Install and Use

pip install my-org-deploy
inference deploy list-targets  # Shows 'openshift' alongside built-in targets
inference deploy init --target openshift

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aissemble_inference_deploy-1.5.0.tar.gz (42.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aissemble_inference_deploy-1.5.0-py3-none-any.whl (40.7 kB view details)

Uploaded Python 3

File details

Details for the file aissemble_inference_deploy-1.5.0.tar.gz.

File metadata

  • Download URL: aissemble_inference_deploy-1.5.0.tar.gz
  • Upload date:
  • Size: 42.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for aissemble_inference_deploy-1.5.0.tar.gz
Algorithm Hash digest
SHA256 c4cdb87d069d4a63d4101a8ba4cfd7cbdcd0c6f8398250dfa28bcf6d0133674f
MD5 ed559dddec8f90d656b176b2672538fc
BLAKE2b-256 af7f1e39f98e92ca8b900e0d0540e897cfb9e1a56036261b3909abfa48d0c81c

See more details on using hashes here.

File details

Details for the file aissemble_inference_deploy-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: aissemble_inference_deploy-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 40.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.24 {"installer":{"name":"uv","version":"0.9.24","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for aissemble_inference_deploy-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 371f5baccb416477e625a2906ff71f699136b44a66a3d050fb5a78336da20eca
MD5 54a014eeb218ddff4be6dd914157bfa8
BLAKE2b-256 2d173f05e49426d4e533af2ef526523bd8d03ed6bed4b58cb712514c6b1d07c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page