MLflow deployment plugin for Modal serverless GPU infrastructure (actively maintained)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

debu-sinha

These details have not been verified by PyPI

Project description

mlflow-modal-deploy

Deploy MLflow models to Modal's serverless GPU infrastructure with a single command.

Installation

pip install mlflow-modal-deploy

Features

One-command deployment: Deploy any MLflow model to Modal's serverless infrastructure
GPU support: T4, L4, L40S, A10, A100, A100-40GB, A100-80GB, H100, H200, B200
Auto-scaling: Configure min/max containers, scale-down windows
Dynamic batching: Built-in request batching for high-throughput workloads
Automatic dependency detection: Extracts requirements from model artifacts
Wheel file support: Handles private dependencies packaged as wheel files
MLflow CLI integration: Use familiar mlflow deployments commands

Quick Start

Python API

from mlflow.deployments import get_deploy_client

# Get the Modal deployment client
client = get_deploy_client("modal")

# Deploy a model
deployment = client.create_deployment(
    name="my-classifier",
    model_uri="runs:/abc123/model",
    config={
        "gpu": "T4",
        "memory": 2048,
        "min_containers": 1,
    }
)

print(f"Deployed to: {deployment['endpoint_url']}")

# Make predictions
predictions = client.predict(
    deployment_name="my-classifier",
    inputs={"feature1": [1, 2, 3], "feature2": [4, 5, 6]}
)

CLI

# Deploy a model
mlflow deployments create -t modal -m runs:/abc123/model --name my-model

# Deploy with GPU
mlflow deployments create -t modal -m runs:/abc123/model --name gpu-model \
    -C gpu=T4 -C memory=4096

# List deployments
mlflow deployments list -t modal

# Get deployment info
mlflow deployments get -t modal --name my-model

# Delete deployment
mlflow deployments delete -t modal --name my-model

Configuration Options

Option	Type	Default	Description
`gpu`	str/list	None	GPU type (T4, L4, L40S, A10, A100, A100-40GB, A100-80GB, H100, H200, B200), multi-GPU (`H100:8`), dedicated (`H100!`), or fallback list (`["H100", "A100"]`)
`memory`	int	512	Memory allocation in MB
`cpu`	float	1.0	CPU cores
`timeout`	int	300	Request timeout in seconds
`startup_timeout`	int	None	Container startup timeout (overrides timeout during model loading)
`scaledown_window`	int	60	Seconds before idle container scales down
`concurrent_inputs`	int	1	Max concurrent requests per container
`target_inputs`	int	None	Target concurrency for autoscaler (enables smarter scaling)
`min_containers`	int	0	Minimum warm containers
`max_containers`	int	None	Maximum containers
`buffer_containers`	int	None	Extra idle containers to maintain under load
`enable_batching`	bool	False	Enable dynamic batching
`max_batch_size`	int	8	Max batch size when batching enabled
`batch_wait_ms`	int	100	Batch wait time in milliseconds
`python_version`	str	auto	Python version (auto-detected from model)
`extra_pip_packages`	list	[]	Additional pip packages to install at deployment time
`pip_index_url`	str	None	Custom PyPI index URL for private packages
`pip_extra_index_url`	str	None	Additional PyPI index URL (fallback)
`modal_secret`	str	None	Modal secret name containing pip credentials

Authentication

Configure Modal authentication before deploying:

# Interactive setup
modal setup

# Or use environment variables
export MODAL_TOKEN_ID=your-token-id
export MODAL_TOKEN_SECRET=your-token-secret

Advanced Usage

Deploy to Specific Workspace

# Use workspace-specific URI
client = get_deploy_client("modal:/production")

Or via CLI:

mlflow deployments create -t modal:/production -m runs:/abc123/model --name my-model

High-Throughput Deployment with Batching

client.create_deployment(
    name="batch-classifier",
    model_uri="runs:/abc123/model",
    config={
        "gpu": "A100",
        "enable_batching": True,
        "max_batch_size": 32,
        "batch_wait_ms": 50,
        "min_containers": 2,
        "max_containers": 20,
    }
)

Adding Extra Packages at Deployment Time

Use extra_pip_packages when the model's auto-detected requirements are incomplete or you need production-specific packages:

client.create_deployment(
    name="my-model",
    model_uri="runs:/abc123/model",
    config={
        "gpu": "A100",
        "extra_pip_packages": [
            "accelerate>=0.24",      # GPU inference optimization
            "prometheus_client",     # Monitoring
            "structlog",             # Production logging
        ],
    }
)

Common use cases:

Missing transitive dependencies: Packages MLflow didn't auto-detect
Inference optimizations: accelerate, bitsandbytes, onnxruntime-gpu
Production monitoring: prometheus_client, opentelemetry-api
Version overrides: Pin specific versions for compatibility

Deploying with Private Packages

For private PyPI servers or authenticated package repositories:

Step 1: Create a Modal secret with your credentials:

# Create a secret with your private PyPI credentials
modal secret create pypi-auth \
    PIP_INDEX_URL="https://user:token@pypi.my-company.com/simple/" \
    PIP_EXTRA_INDEX_URL="https://pypi.org/simple/"

Step 2: Reference the secret in your deployment:

client.create_deployment(
    name="my-model",
    model_uri="runs:/abc123/model",
    config={
        # Option 1: Use Modal secret for authenticated access
        "modal_secret": "pypi-auth",
        "extra_pip_packages": ["my-private-package>=1.0"],

        # Option 2: Direct URL (for unauthenticated private repos)
        # "pip_index_url": "https://pypi.my-company.com/simple/",
        # "pip_extra_index_url": "https://pypi.org/simple/",
    }
)

Supported private package sources:

Private PyPI servers: Artifactory, CodeArtifact, DevPI, Nexus
Authenticated indexes: Any pip-compatible index with auth tokens
Wheel files: Already supported via the code/ directory in model artifacts

Models with Private Dependencies

If your model includes wheel files in the code/ directory, they are automatically detected and installed:

model/
├── MLmodel
├── requirements.txt
├── code/
│   └── my_private_package-1.0.0-py3-none-any.whl  # Auto-detected
└── ...

Local Development

Test your deployment locally before deploying to Modal:

from mlflow_modal import run_local

run_local(
    target_uri="modal",
    name="test-model",
    model_uri="runs:/abc123/model",
    config={"gpu": "T4"}
)

Requirements

Python 3.10+
MLflow 2.10.0+
Modal 1.0.0+

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/debu-sinha/mlflow-modal-deploy.git
cd mlflow-modal-deploy

# Install with dev dependencies
uv sync --extra dev

# Install pre-commit hooks
uv run pre-commit install

# Run tests
uv run pytest tests/ -v

License

Apache License 2.0

Acknowledgments

MLflow - Open source platform for the ML lifecycle
Modal - Serverless cloud for AI/ML

Support

GitHub Issues - Bug reports and feature requests
MLflow Slack - Community discussion

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

debu-sinha

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.2

Mar 9, 2026

0.6.1

Feb 20, 2026

0.6.0

Jan 16, 2026

This version

0.5.1

Jan 16, 2026

0.5.0

Jan 16, 2026

0.4.0

Jan 16, 2026

0.3.1

Jan 16, 2026

0.3.0

Jan 16, 2026

0.2.5

Jan 15, 2026

0.2.4

Jan 15, 2026

0.2.3

Jan 15, 2026

0.2.2

Jan 15, 2026

0.2.1

Jan 15, 2026

0.2.0

Jan 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_modal_deploy-0.5.1.tar.gz (25.1 kB view details)

Uploaded Jan 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlflow_modal_deploy-0.5.1-py3-none-any.whl (18.2 kB view details)

Uploaded Jan 16, 2026 Python 3

File details

Details for the file mlflow_modal_deploy-0.5.1.tar.gz.

File metadata

Download URL: mlflow_modal_deploy-0.5.1.tar.gz
Upload date: Jan 16, 2026
Size: 25.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlflow_modal_deploy-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`d0aef20a6b5529a2288ce11d39d8ddc4fae74966f9b2d0975ecffb9093ae374f`
MD5	`6ecc1e47853a286f78936d9620674568`
BLAKE2b-256	`515e4357774f830958e469dbaad88a90e787447d2aa3ca00ce4f94e5f77d88d0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlflow_modal_deploy-0.5.1.tar.gz:

Publisher: release.yml on debu-sinha/mlflow-modal-deploy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlflow_modal_deploy-0.5.1.tar.gz
- Subject digest: d0aef20a6b5529a2288ce11d39d8ddc4fae74966f9b2d0975ecffb9093ae374f
- Sigstore transparency entry: 830271975
- Sigstore integration time: Jan 16, 2026
Source repository:
- Permalink: debu-sinha/mlflow-modal-deploy@3fafa0079bc9d2b58966eff1d3e99bbcc5a32a36
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/debu-sinha
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3fafa0079bc9d2b58966eff1d3e99bbcc5a32a36
- Trigger Event: push

File details

Details for the file mlflow_modal_deploy-0.5.1-py3-none-any.whl.

File metadata

Download URL: mlflow_modal_deploy-0.5.1-py3-none-any.whl
Upload date: Jan 16, 2026
Size: 18.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlflow_modal_deploy-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd1e87bfa5d1f18dfc7885d22080d9dbe5c0da26509523b59d1d66a08f1a76e6`
MD5	`a5ec1f75b679548d70562f28b41c98e7`
BLAKE2b-256	`80c826933646635efcf790b645e4c7fa5cdc287ef80332ef3c2ba60ef06412ec`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlflow_modal_deploy-0.5.1-py3-none-any.whl:

Publisher: release.yml on debu-sinha/mlflow-modal-deploy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlflow_modal_deploy-0.5.1-py3-none-any.whl
- Subject digest: cd1e87bfa5d1f18dfc7885d22080d9dbe5c0da26509523b59d1d66a08f1a76e6
- Sigstore transparency entry: 830271982
- Sigstore integration time: Jan 16, 2026
Source repository:
- Permalink: debu-sinha/mlflow-modal-deploy@3fafa0079bc9d2b58966eff1d3e99bbcc5a32a36
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/debu-sinha
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@3fafa0079bc9d2b58966eff1d3e99bbcc5a32a36
- Trigger Event: push

mlflow-modal-deploy 0.5.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

mlflow-modal-deploy

Installation

Features

Quick Start

Python API

CLI

Configuration Options

Authentication

Advanced Usage

Deploy to Specific Workspace

High-Throughput Deployment with Batching

Adding Extra Packages at Deployment Time

Deploying with Private Packages

Models with Private Dependencies

Local Development

Requirements

Contributing

Development Setup

License

Acknowledgments

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance