Skip to main content

MLflow deployment plugin for Modal serverless GPU infrastructure (actively maintained)

Project description

mlflow-modal-deploy

CI CodeQL PyPI version License Python 3.10+

Deploy MLflow models to Modal's serverless GPU infrastructure with a single command.

Installation

pip install mlflow-modal-deploy

Features

  • One-command deployment: Deploy any MLflow model to Modal's serverless infrastructure
  • GPU support: T4, L4, A10G, A100, A100-80GB, H100
  • Auto-scaling: Configure min/max containers, scale-down windows
  • Dynamic batching: Built-in request batching for high-throughput workloads
  • Automatic dependency detection: Extracts requirements from model artifacts
  • Wheel file support: Handles private dependencies packaged as wheel files
  • MLflow CLI integration: Use familiar mlflow deployments commands

Quick Start

Python API

from mlflow.deployments import get_deploy_client

# Get the Modal deployment client
client = get_deploy_client("modal")

# Deploy a model
deployment = client.create_deployment(
    name="my-classifier",
    model_uri="runs:/abc123/model",
    config={
        "gpu": "T4",
        "memory": 2048,
        "min_containers": 1,
    }
)

print(f"Deployed to: {deployment['endpoint_url']}")

# Make predictions
predictions = client.predict(
    deployment_name="my-classifier",
    inputs={"feature1": [1, 2, 3], "feature2": [4, 5, 6]}
)

CLI

# Deploy a model
mlflow deployments create -t modal -m runs:/abc123/model --name my-model

# Deploy with GPU
mlflow deployments create -t modal -m runs:/abc123/model --name gpu-model \
    -C gpu=T4 -C memory=4096

# List deployments
mlflow deployments list -t modal

# Get deployment info
mlflow deployments get -t modal --name my-model

# Delete deployment
mlflow deployments delete -t modal --name my-model

Configuration Options

Option Type Default Description
gpu str None GPU type: T4, L4, A10G, A100, A100-80GB, H100
memory int 512 Memory allocation in MB
cpu float 1.0 CPU cores
timeout int 300 Request timeout in seconds
container_idle_timeout int 60 Container idle timeout in seconds
min_containers int 0 Minimum warm containers
max_containers int None Maximum containers
enable_batching bool False Enable dynamic batching
max_batch_size int 8 Max batch size when batching enabled
batch_wait_ms int 100 Batch wait time in milliseconds
python_version str auto Python version (auto-detected from model)

Authentication

Configure Modal authentication before deploying:

# Interactive setup
modal setup

# Or use environment variables
export MODAL_TOKEN_ID=your-token-id
export MODAL_TOKEN_SECRET=your-token-secret

Advanced Usage

Deploy to Specific Workspace

# Use workspace-specific URI
client = get_deploy_client("modal:/production")

Or via CLI:

mlflow deployments create -t modal:/production -m runs:/abc123/model --name my-model

High-Throughput Deployment with Batching

client.create_deployment(
    name="batch-classifier",
    model_uri="runs:/abc123/model",
    config={
        "gpu": "A100",
        "enable_batching": True,
        "max_batch_size": 32,
        "batch_wait_ms": 50,
        "min_containers": 2,
        "max_containers": 20,
    }
)

Models with Private Dependencies

If your model includes wheel files in the code/ directory, they are automatically detected and installed:

model/
├── MLmodel
├── requirements.txt
├── code/
│   └── my_private_package-1.0.0-py3-none-any.whl  # Auto-detected
└── ...

Local Development

Test your deployment locally before deploying to Modal:

from mlflow_modal import run_local

run_local(
    target_uri="modal",
    name="test-model",
    model_uri="runs:/abc123/model",
    config={"gpu": "T4"}
)

Requirements

  • Python 3.10+
  • MLflow 2.10.0+
  • Modal 0.64.0+

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/debu-sinha/mlflow-modal-deploy.git
cd mlflow-modal-deploy

# Install with dev dependencies
uv sync --extra dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest tests/ -v

License

Apache License 2.0

Acknowledgments

  • MLflow - Open source platform for the ML lifecycle
  • Modal - Serverless cloud for AI/ML

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_modal_deploy-0.3.0.tar.gz (20.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_modal_deploy-0.3.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_modal_deploy-0.3.0.tar.gz.

File metadata

  • Download URL: mlflow_modal_deploy-0.3.0.tar.gz
  • Upload date:
  • Size: 20.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlflow_modal_deploy-0.3.0.tar.gz
Algorithm Hash digest
SHA256 40b7586315ae006cf3a91c63313d78da1922908ac1646ff7427810e7257dd9d1
MD5 ae18aefb3576d251a995c047585e53a2
BLAKE2b-256 7d79303f8a7e1788bf5f586075b6f8b2f1c43e4d2042c2226ec59a415df834f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlflow_modal_deploy-0.3.0.tar.gz:

Publisher: release.yml on debu-sinha/mlflow-modal-deploy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlflow_modal_deploy-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mlflow_modal_deploy-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2036f5303d23138d9f47534c52082619b0ff189d210c174cfe161242c81dcd16
MD5 5b650898027bc3482080d067d50f872b
BLAKE2b-256 c75c3ae28887d6ef7d7af30b243028554f479d75b814deac17031a9cc479b162

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlflow_modal_deploy-0.3.0-py3-none-any.whl:

Publisher: release.yml on debu-sinha/mlflow-modal-deploy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page