MLflow deployment plugin for Modal serverless GPU infrastructure (actively maintained)
Project description
mlflow-modal-deploy
Deploy MLflow models to Modal's serverless GPU infrastructure with a single command.
Installation
pip install mlflow-modal-deploy
Features
- One-command deployment: Deploy any MLflow model to Modal's serverless infrastructure
- GPU support: T4, L4, A10G, A100, A100-80GB, H100
- Auto-scaling: Configure min/max containers, scale-down windows
- Dynamic batching: Built-in request batching for high-throughput workloads
- Automatic dependency detection: Extracts requirements from model artifacts
- Wheel file support: Handles private dependencies packaged as wheel files
- MLflow CLI integration: Use familiar
mlflow deploymentscommands
Quick Start
Python API
from mlflow.deployments import get_deploy_client
# Get the Modal deployment client
client = get_deploy_client("modal")
# Deploy a model
deployment = client.create_deployment(
name="my-classifier",
model_uri="runs:/abc123/model",
config={
"gpu": "T4",
"memory": 2048,
"min_containers": 1,
}
)
print(f"Deployed to: {deployment['endpoint_url']}")
# Make predictions
predictions = client.predict(
deployment_name="my-classifier",
inputs={"feature1": [1, 2, 3], "feature2": [4, 5, 6]}
)
CLI
# Deploy a model
mlflow deployments create -t modal -m runs:/abc123/model --name my-model
# Deploy with GPU
mlflow deployments create -t modal -m runs:/abc123/model --name gpu-model \
-C gpu=T4 -C memory=4096
# List deployments
mlflow deployments list -t modal
# Get deployment info
mlflow deployments get -t modal --name my-model
# Delete deployment
mlflow deployments delete -t modal --name my-model
Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
gpu |
str | None | GPU type: T4, L4, A10G, A100, A100-80GB, H100 |
memory |
int | 512 | Memory allocation in MB |
cpu |
float | 1.0 | CPU cores |
timeout |
int | 300 | Request timeout in seconds |
container_idle_timeout |
int | 60 | Container idle timeout in seconds |
min_containers |
int | 0 | Minimum warm containers |
max_containers |
int | None | Maximum containers |
enable_batching |
bool | False | Enable dynamic batching |
max_batch_size |
int | 8 | Max batch size when batching enabled |
batch_wait_ms |
int | 100 | Batch wait time in milliseconds |
python_version |
str | auto | Python version (auto-detected from model) |
Authentication
Configure Modal authentication before deploying:
# Interactive setup
modal setup
# Or use environment variables
export MODAL_TOKEN_ID=your-token-id
export MODAL_TOKEN_SECRET=your-token-secret
Advanced Usage
Deploy to Specific Workspace
# Use workspace-specific URI
client = get_deploy_client("modal:/production")
Or via CLI:
mlflow deployments create -t modal:/production -m runs:/abc123/model --name my-model
High-Throughput Deployment with Batching
client.create_deployment(
name="batch-classifier",
model_uri="runs:/abc123/model",
config={
"gpu": "A100",
"enable_batching": True,
"max_batch_size": 32,
"batch_wait_ms": 50,
"min_containers": 2,
"max_containers": 20,
}
)
Models with Private Dependencies
If your model includes wheel files in the code/ directory, they are automatically detected and installed:
model/
├── MLmodel
├── requirements.txt
├── code/
│ └── my_private_package-1.0.0-py3-none-any.whl # Auto-detected
└── ...
Local Development
Test your deployment locally before deploying to Modal:
from mlflow_modal import run_local
run_local(
target_uri="modal",
name="test-model",
model_uri="runs:/abc123/model",
config={"gpu": "T4"}
)
Requirements
- Python 3.10+
- MLflow 2.10.0+
- Modal 0.64.0+
Contributing
Contributions welcome! Please see CONTRIBUTING.md for guidelines.
Development Setup
# Clone the repository
git clone https://github.com/debu-sinha/mlflow-modal-deploy.git
cd mlflow-modal-deploy
# Install with dev dependencies
uv sync --extra dev
# Install pre-commit hooks
uv run pre-commit install
# Run tests
uv run pytest tests/ -v
License
Apache License 2.0
Acknowledgments
Support
- GitHub Issues - Bug reports and feature requests
- MLflow Slack - Community discussion
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlflow_modal_deploy-0.2.1.tar.gz.
File metadata
- Download URL: mlflow_modal_deploy-0.2.1.tar.gz
- Upload date:
- Size: 21.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd547253be188466bf914a4976702b3c3eb98d02c1332248423de047a3506192
|
|
| MD5 |
3b525ca2e632dbc0a50e7db15578b730
|
|
| BLAKE2b-256 |
cfd3e5f3979250622ab8958ce72a777c68bfc5768fdbc22490be7369954f6fb8
|
File details
Details for the file mlflow_modal_deploy-0.2.1-py3-none-any.whl.
File metadata
- Download URL: mlflow_modal_deploy-0.2.1-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5a6941a6aa943fc7699087e53f303009d9f1cae5ed47769586fd57fd2d00193
|
|
| MD5 |
d2e3b1ae9898235300edf39320ffd9cb
|
|
| BLAKE2b-256 |
c92d64ef9957f75d2df7a0ed2421d9bca5a1d88c247b01ef99f1caa36c691ec5
|