MLflow deployment plugin for Modal serverless GPU infrastructure (actively maintained)

These details have not been verified by PyPI

Project links

Project description

mlflow-modal-deploy

Deploy MLflow models to Modal's serverless GPU infrastructure with a single command.

Installation

pip install mlflow-modal-deploy

Features

One-command deployment: Deploy any MLflow model to Modal's serverless infrastructure
GPU support: T4, L4, A10G, A100, A100-80GB, H100
Auto-scaling: Configure min/max containers, scale-down windows
Dynamic batching: Built-in request batching for high-throughput workloads
Automatic dependency detection: Extracts requirements from model artifacts
Wheel file support: Handles private dependencies packaged as wheel files
MLflow CLI integration: Use familiar mlflow deployments commands

Quick Start

Python API

from mlflow.deployments import get_deploy_client

# Get the Modal deployment client
client = get_deploy_client("modal")

# Deploy a model
deployment = client.create_deployment(
    name="my-classifier",
    model_uri="runs:/abc123/model",
    config={
        "gpu": "T4",
        "memory": 2048,
        "min_containers": 1,
    }
)

print(f"Deployed to: {deployment['endpoint_url']}")

# Make predictions
predictions = client.predict(
    deployment_name="my-classifier",
    inputs={"feature1": [1, 2, 3], "feature2": [4, 5, 6]}
)

CLI

# Deploy a model
mlflow deployments create -t modal -m runs:/abc123/model --name my-model

# Deploy with GPU
mlflow deployments create -t modal -m runs:/abc123/model --name gpu-model \
    -C gpu=T4 -C memory=4096

# List deployments
mlflow deployments list -t modal

# Get deployment info
mlflow deployments get -t modal --name my-model

# Delete deployment
mlflow deployments delete -t modal --name my-model

Configuration Options

Option	Type	Default	Description
`gpu`	str	None	GPU type: T4, L4, A10G, A100, A100-80GB, H100
`memory`	int	512	Memory allocation in MB
`cpu`	float	1.0	CPU cores
`timeout`	int	300	Request timeout in seconds
`container_idle_timeout`	int	60	Container idle timeout in seconds
`min_containers`	int	0	Minimum warm containers
`max_containers`	int	None	Maximum containers
`enable_batching`	bool	False	Enable dynamic batching
`max_batch_size`	int	8	Max batch size when batching enabled
`batch_wait_ms`	int	100	Batch wait time in milliseconds
`python_version`	str	auto	Python version (auto-detected from model)

Authentication

Configure Modal authentication before deploying:

# Interactive setup
modal setup

# Or use environment variables
export MODAL_TOKEN_ID=your-token-id
export MODAL_TOKEN_SECRET=your-token-secret

Advanced Usage

Deploy to Specific Workspace

# Use workspace-specific URI
client = get_deploy_client("modal:/production")

Or via CLI:

mlflow deployments create -t modal:/production -m runs:/abc123/model --name my-model

High-Throughput Deployment with Batching

client.create_deployment(
    name="batch-classifier",
    model_uri="runs:/abc123/model",
    config={
        "gpu": "A100",
        "enable_batching": True,
        "max_batch_size": 32,
        "batch_wait_ms": 50,
        "min_containers": 2,
        "max_containers": 20,
    }
)

Models with Private Dependencies

If your model includes wheel files in the code/ directory, they are automatically detected and installed:

model/
├── MLmodel
├── requirements.txt
├── code/
│   └── my_private_package-1.0.0-py3-none-any.whl  # Auto-detected
└── ...

Local Development

Test your deployment locally before deploying to Modal:

from mlflow_modal import run_local

run_local(
    target_uri="modal",
    name="test-model",
    model_uri="runs:/abc123/model",
    config={"gpu": "T4"}
)

Requirements

Python 3.10+
MLflow 2.10.0+
Modal 0.64.0+

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/debu-sinha/mlflow-modal-deploy.git
cd mlflow-modal-deploy

# Install with dev dependencies
uv sync --extra dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest tests/ -v

License

Apache License 2.0

Acknowledgments

MLflow - Open source platform for the ML lifecycle
Modal - Serverless cloud for AI/ML

Support

GitHub Issues - Bug reports and feature requests
MLflow Slack - Community discussion

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.2

Mar 9, 2026

0.6.1

Feb 20, 2026

0.6.0

Jan 16, 2026

0.5.1

Jan 16, 2026

0.5.0

Jan 16, 2026

0.4.0

Jan 16, 2026

0.3.1

Jan 16, 2026

0.3.0

Jan 16, 2026

0.2.5

Jan 15, 2026

0.2.4

Jan 15, 2026

0.2.3

Jan 15, 2026

0.2.2

Jan 15, 2026

0.2.1

Jan 15, 2026

This version

0.2.0

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_modal_deploy-0.2.0.tar.gz (21.0 kB view details)

Uploaded Jan 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlflow_modal_deploy-0.2.0-py3-none-any.whl (15.6 kB view details)

Uploaded Jan 15, 2026 Python 3

File details

Details for the file mlflow_modal_deploy-0.2.0.tar.gz.

File metadata

Download URL: mlflow_modal_deploy-0.2.0.tar.gz
Upload date: Jan 15, 2026
Size: 21.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mlflow_modal_deploy-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`af942705c466913ebeb5ad4c73f72108d23397a7a97ee3542e893a325c1c1380`
MD5	`6e1bf96b31d84fed18ec7db6a3c74e12`
BLAKE2b-256	`72f108fa1ada23bebf42051ae6f49a91877b27191d9d71c8b0e9371635304cab`

See more details on using hashes here.

File details

Details for the file mlflow_modal_deploy-0.2.0-py3-none-any.whl.

File metadata

Download URL: mlflow_modal_deploy-0.2.0-py3-none-any.whl
Upload date: Jan 15, 2026
Size: 15.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for mlflow_modal_deploy-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`574140235c3b7308eae35ac47635b0ad023dd7f40c66a9de89ef62290d37093f`
MD5	`487881c81d52d4c845df3b9070fcdcb9`
BLAKE2b-256	`152ca1383fe499982e1d46bb4f2f5c5c4aee8a63e5490d201d9f3e6d2457aadc`

See more details on using hashes here.

mlflow-modal-deploy 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

mlflow-modal-deploy

Installation

Features

Quick Start

Python API

CLI

Configuration Options

Authentication

Advanced Usage

Deploy to Specific Workspace

High-Throughput Deployment with Batching

Models with Private Dependencies

Local Development

Requirements

Contributing

Development Setup

License

Acknowledgments

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes