Michelangelo is an end-to-end model lifecycle management system at large scale
Project description
Michelangelo
An end-to-end ML platform for building, training, and registering machine learning models at scale.
Michelangelo gives ML engineers and data scientists a unified Python SDK for the entire model lifecycle — from data preparation and distributed training to model registration and production deployment. Define your ML workflows as Python functions using simple decorators, and Michelangelo handles orchestration, caching, and scaling across Ray and Spark clusters.
Key Features
-
UniFlow Pipeline Framework — Define ML workflows with
@taskand@workflowdecorators. Write plain Python functions and Michelangelo handles distributed execution, data passing between tasks, and result caching. -
Distributed Execution — Scale tasks across Ray or Spark clusters with a single config change. Specify CPU, memory, GPU, and worker resources per task — no changes to your business logic required.
-
Built-in Caching and Resume — Tasks cache results automatically based on inputs. If a pipeline fails partway through, resume from where it left off instead of rerunning everything.
-
Python API Client — Programmatically manage projects, pipelines, model registry, and pipeline runs through a gRPC-based Python client.
-
CLI (
ma) — Register pipelines, manage triggers, run sandboxes, and interact with the Michelangelo platform from your terminal. -
Flexible Storage — Read and write data across S3, GCS, HDFS, and local filesystems using the fsspec-based storage layer.
Installation
Install the core package:
pip install michelangelo
Install with distributed execution plugins (Ray and Spark):
pip install michelangelo[plugin]
Install Extras
| Extra | What it includes | When to use it |
|---|---|---|
michelangelo[plugin] |
Ray, PySpark | You want to run tasks on distributed Ray or Spark clusters |
michelangelo[vllm] |
vLLM, Ray, PyTorch, Transformers | You're serving or fine-tuning large language models |
michelangelo[example] |
All ML libraries for examples | You want to run the included example projects |
michelangelo[dev] |
pytest, ruff, pre-commit, Ray | You're contributing to Michelangelo itself |
Quickstart
Here's a minimal pipeline that loads data and trains a model using Ray for distributed execution:
import michelangelo.uniflow.core as uniflow
from michelangelo.uniflow.plugins.ray import RayTask
@uniflow.task(config=RayTask(head_cpu=1, head_memory="2Gi"))
def load_data(path: str):
"""Load and preprocess data."""
# Your data loading logic here
print(f"Loading data from {path}")
return {"train": [1, 2, 3], "test": [4, 5]}
@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
def train_model(data):
"""Train a model on the prepared data."""
print(f"Training on {len(data['train'])} samples")
return {"accuracy": 0.95}
@uniflow.workflow()
def training_pipeline(data_path: str):
"""A simple training pipeline."""
data = load_data(data_path)
result = train_model(data)
return result
if __name__ == "__main__":
ctx = uniflow.create_context()
ctx.run(training_pipeline, data_path="s3://my-bucket/data")
Run locally:
python my_pipeline.py
Want to use Spark instead of Ray? Just swap the task config:
from michelangelo.uniflow.plugins.spark import SparkTask
@uniflow.task(config=SparkTask(driver_cpu=2, executor_cpu=4, executor_instances=3))
def process_data(df):
# Your Spark processing logic
return df
For complete working examples, see the examples directory, including:
- BERT fine-tuning on CoLA — Text classification with distributed GPU training
- XGBoost on Boston Housing — Tabular regression with distributed training
- GPT fine-tuning with LoRA — Large language model fine-tuning
Using the Python API Client
Manage platform resources programmatically:
from michelangelo.api.v2.client import APIClient
APIClient.set_caller("my-client")
# List projects
projects = APIClient.ProjectService.list_project(namespace="default")
# Create a new project
from michelangelo.gen.api.v2.project_pb2 import Project
proj = Project()
proj.metadata.namespace = "default"
proj.metadata.name = "my-project"
proj.spec.description = "My ML project"
APIClient.ProjectService.create_project(proj)
Set the API server address via environment variable:
export MICHELANGELO_API_SERVER="localhost:12345"
Documentation
Full documentation is available at michelangelo-ai.github.io/michelangelo/docs.
- User Guides — Step-by-step guides for data preparation, training, and deployment
- ML Pipelines — Deep dive into the UniFlow pipeline framework
- Set Up Triggers — Automate pipeline execution with cron and backfill triggers
- CLI Reference — Full command-line interface documentation
Contributing
We welcome contributions! To get started:
git clone https://github.com/michelangelo-ai/michelangelo.git
cd michelangelo/python
pip install -e ".[dev]"
Run the test suite:
pytest
Format your code:
ruff format .
ruff check .
Requirements
- Python 3.9+
License
See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file michelangelo-0.1.1.tar.gz.
File metadata
- Download URL: michelangelo-0.1.1.tar.gz
- Upload date:
- Size: 18.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.9.22 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9766e7a3917db3f8d35436243d0155827cd513d6353506fe731fe3cef635e49
|
|
| MD5 |
ee92ebc9bb2519aa05f58d6035e0935e
|
|
| BLAKE2b-256 |
c5769e4af5c5993eba98016900c5ce419c27d80a08a9eaf12d3431de932f9abe
|
File details
Details for the file michelangelo-0.1.1-py3-none-any.whl.
File metadata
- Download URL: michelangelo-0.1.1-py3-none-any.whl
- Upload date:
- Size: 18.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.2.1 CPython/3.9.22 Darwin/25.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59fbc23f2dba3c92c25a94c26886c54730f278bf4c632bdffc92e3ed25982329
|
|
| MD5 |
be49e305372dbb22d99f1d186edac365
|
|
| BLAKE2b-256 |
54e57ee0fa9eef869f44e1e7b845aa0a7f4480786f93570e905dafffca1cf2af
|