Skip to main content

Michelangelo is an end-to-end model lifecycle management system at large scale

Project description

Michelangelo

An end-to-end ML platform for building, training, and registering machine learning models at scale.

Documentation GitHub

Michelangelo gives ML engineers and data scientists a unified Python SDK for the entire model lifecycle — from data preparation and distributed training to model registration and production deployment. Define your ML workflows as Python functions using simple decorators, and Michelangelo handles orchestration, caching, and scaling across Ray and Spark clusters.

Key Features

  • UniFlow Pipeline Framework — Define ML workflows with @task and @workflow decorators. Write plain Python functions and Michelangelo handles distributed execution, data passing between tasks, and result caching.

  • Distributed Execution — Scale tasks across Ray or Spark clusters with a single config change. Specify CPU, memory, GPU, and worker resources per task — no changes to your business logic required.

  • Built-in Caching and Resume — Tasks cache results automatically based on inputs. If a pipeline fails partway through, resume from where it left off instead of rerunning everything.

  • Python API Client — Programmatically manage projects, pipelines, model registry, and pipeline runs through a gRPC-based Python client.

  • CLI (ma) — Register pipelines, manage triggers, run sandboxes, and interact with the Michelangelo platform from your terminal.

  • Flexible Storage — Read and write data across S3, GCS, HDFS, and local filesystems using the fsspec-based storage layer.

Installation

Install the core package:

pip install michelangelo

Install with distributed execution plugins (Ray and Spark):

pip install michelangelo[plugin]

Install Extras

Extra What it includes When to use it
michelangelo[plugin] Ray, PySpark You want to run tasks on distributed Ray or Spark clusters
michelangelo[vllm] vLLM, Ray, PyTorch, Transformers You're serving or fine-tuning large language models
michelangelo[example] All ML libraries for examples You want to run the included example projects
michelangelo[dev] pytest, ruff, pre-commit, Ray You're contributing to Michelangelo itself

Quickstart

Here's a minimal pipeline that loads data and trains a model using Ray for distributed execution:

import michelangelo.uniflow.core as uniflow
from michelangelo.uniflow.plugins.ray import RayTask


@uniflow.task(config=RayTask(head_cpu=1, head_memory="2Gi"))
def load_data(path: str):
    """Load and preprocess data."""
    # Your data loading logic here
    print(f"Loading data from {path}")
    return {"train": [1, 2, 3], "test": [4, 5]}


@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
def train_model(data):
    """Train a model on the prepared data."""
    print(f"Training on {len(data['train'])} samples")
    return {"accuracy": 0.95}


@uniflow.workflow()
def training_pipeline(data_path: str):
    """A simple training pipeline."""
    data = load_data(data_path)
    result = train_model(data)
    return result


if __name__ == "__main__":
    ctx = uniflow.create_context()
    ctx.run(training_pipeline, data_path="s3://my-bucket/data")

Run locally:

python my_pipeline.py

Want to use Spark instead of Ray? Just swap the task config:

from michelangelo.uniflow.plugins.spark import SparkTask

@uniflow.task(config=SparkTask(driver_cpu=2, executor_cpu=4, executor_instances=3))
def process_data(df):
    # Your Spark processing logic
    return df

For complete working examples, see the examples directory, including:

Using the Python API Client

Manage platform resources programmatically:

from michelangelo.api.v2.client import APIClient

APIClient.set_caller("my-client")

# List projects
projects = APIClient.ProjectService.list_project(namespace="default")

# Create a new project
from michelangelo.gen.api.v2.project_pb2 import Project

proj = Project()
proj.metadata.namespace = "default"
proj.metadata.name = "my-project"
proj.spec.description = "My ML project"
APIClient.ProjectService.create_project(proj)

Set the API server address via environment variable:

export MICHELANGELO_API_SERVER="localhost:12345"

Documentation

Full documentation is available at michelangelo-ai.github.io/michelangelo/docs.

  • User Guides — Step-by-step guides for data preparation, training, and deployment
  • ML Pipelines — Deep dive into the UniFlow pipeline framework
  • Set Up Triggers — Automate pipeline execution with cron and backfill triggers
  • CLI Reference — Full command-line interface documentation

Contributing

We welcome contributions! To get started:

git clone https://github.com/michelangelo-ai/michelangelo.git
cd michelangelo/python
pip install -e ".[dev]"

Run the test suite:

pytest

Format your code:

ruff format .
ruff check .

Requirements

  • Python 3.9+

License

See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

michelangelo-0.1.1.tar.gz (18.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

michelangelo-0.1.1-py3-none-any.whl (18.5 MB view details)

Uploaded Python 3

File details

Details for the file michelangelo-0.1.1.tar.gz.

File metadata

  • Download URL: michelangelo-0.1.1.tar.gz
  • Upload date:
  • Size: 18.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.9.22 Darwin/25.3.0

File hashes

Hashes for michelangelo-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c9766e7a3917db3f8d35436243d0155827cd513d6353506fe731fe3cef635e49
MD5 ee92ebc9bb2519aa05f58d6035e0935e
BLAKE2b-256 c5769e4af5c5993eba98016900c5ce419c27d80a08a9eaf12d3431de932f9abe

See more details on using hashes here.

File details

Details for the file michelangelo-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: michelangelo-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 18.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.9.22 Darwin/25.3.0

File hashes

Hashes for michelangelo-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 59fbc23f2dba3c92c25a94c26886c54730f278bf4c632bdffc92e3ed25982329
MD5 be49e305372dbb22d99f1d186edac365
BLAKE2b-256 54e57ee0fa9eef869f44e1e7b845aa0a7f4480786f93570e905dafffca1cf2af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page