ML Systems Infrastructure Modeling Engine — first-principles analytical framework for ML workloads.
Project description
[!NOTE] 📌 Early release (2026)
MLSys·im shipped with the 2026 MLSysBook refresh. The modeling platform, APIs, and lab integrations are actively iterated as we harden the simulator and teaching workflows.
Feedback — GitHub issues or pull requests.
🚀 MLSys·im: The Modeling Platform
The physics-grounded analytical simulator powering the Machine Learning Systems ecosystem.
Provides a unified "Single Source of Truth" (SSoT) for modeling systems from sub-watt microcontrollers to exaflop-scale global fleets.
🏗 The 5-Layer Analytical Stack
mlsysim implements a "Progressive Lowering" architecture, separating high-level workloads from the physical infrastructure that executes them.
| Layer | Domain | Key Components |
|---|---|---|
| Layer A | Workload Representationmlsysim.models |
FLOPs, parameters, and intensity. e.g., Llama3_70B, ResNet50 |
| Layer B | Hardware Registrymlsysim.hardware |
Concrete specs for real-world silicon. e.g., H100, TPUv5p, Jetson |
| Layer C | Infrastructuremlsysim.infra |
Grid profiles and datacenter sustainability. e.g., PUE, Carbon Intensity, WUE |
| Layer D | Systems & Topologymlsysim.systems |
Fleet configurations and network fabrics. e.g., Doorbell, AutoDrive Scenarios |
| Layer E | Execution & Resolversmlsysim.core.solver |
The 3-tier math engine: Models, Solvers, and Optimizers (Design space search). |
🚀 Quick Usage: The Agent-Ready CLI
mlsysim is a first-principles analytical calculator for ML systems. It provides a terminal UI for humans and a strict JSON API for CI/CD pipelines and AI agents.
Accuracy note: mlsysim predictions are typically within 2–5× of measured performance for well-characterized workloads. For production capacity planning, always validate with benchmarks. This tool formalizes the back-of-envelope math that senior engineers do intuitively — it is not a substitute for profiling or load testing.
1. Explore the Registry (The Zoo)
Discover built-in hardware, models, and infrastructure without reading source code:
mlsysim zoo hardware
mlsysim zoo models
2. Quick Evaluation (CLI Flags)
Evaluate the physics of a workload on a specific hardware node instantly: mlsysim eval Llama3_8B H100 --batch-size 32
3. Deep Simulation (Infrastructure as Code)
Define your entire cluster and SLA constraints in a declarative mlsys.yaml file:
# example_cluster.yaml
version: "1.0"
workload:
name: "Llama3_70B"
batch_size: 4096
hardware:
name: "H100"
nodes: 64
ops:
region: "Quebec"
duration_days: 14.0
constraints:
assert:
- metric: "performance.latency"
max: 50.0
Then compile and evaluate the 3-lens scorecard (Feasibility, Performance, Macro): mlsysim eval example_cluster.yaml
4. CI/CD & Agentic Automation
Every command supports strict, schema-validated JSON output. If an assert constraint is violated, the CLI returns a semantic Exit Code 3.
# Export the JSON Schema for your IDE or AI Agent
mlsysim schema > schema.json
# Run an evaluation in a CI pipeline
tco=$(mlsysim --output json eval example_cluster.yaml | jq .macro.metrics.tco_usd)
5. Design Space Search (Optimizers)
Use the Tier 3 Engineering Engine to automatically find the optimal configuration:
mlsysim optimize parallelism example_cluster.yaml
mlsysim optimize placement example_cluster.yaml --carbon-tax 150
🛡 Stability & Integrity
Because this core powers a printed textbook, we enforce strict Invariant Verification. Every physical constant is traceable to a primary source (datasheet or paper), and dimensional integrity is enforced via pint.
⚠️ What This Tool Does Not Model
MLSysim is an analytical hardware calculator, not a production deployment simulator. The 22 walls model physical and economic constraints that bound ML system performance. Several critical production concerns are deliberately out of scope:
| Concern | Why it matters | Where to learn more |
|---|---|---|
| Data drift / distribution shift | The #1 cause of production ML failures — model accuracy degrades silently as input distributions change | Sculley et al. (2015), "Hidden Technical Debt in ML Systems" |
| Model versioning & rollback | Production requires running multiple versions, A/B testing, and safe rollback | Huyen (2022), Designing Machine Learning Systems |
| Monitoring & observability | You cannot manage what you cannot measure — prediction distributions, latency percentiles, error rates | Google SRE Book (2016); Huyen (2022) |
| Feature store freshness | Stale features silently degrade real-time models (recommendations, fraud detection) | Uber Michelangelo (2017) |
| Software bugs & misconfigurations | Most outages are caused by software, not hardware | Barroso et al. (2018) |
| Human factors | Team velocity, on-call burden, and organizational alignment often dominate outcomes | Brooks (1975), The Mythical Man-Month |
Passing all 22 walls is necessary but not sufficient for a successful production deployment.
Students using this tool should understand that infrastructure physics (what mlsysim models) is one dimension of a multi-dimensional engineering challenge.
📖 How to Cite
If you use mlsysim in your research or teaching, please cite:
@software{mlsysim2026,
author = {Janapa Reddi, Vijay},
title = {{MLSys$\cdot$im}: First-Principles Infrastructure Modeling for Machine Learning Systems},
year = {2026},
url = {https://mlsysbook.ai/mlsysim},
version = {0.1.1},
institution = {Harvard University}
}
🛠 Installation
MLSys·im is designed to be highly modular. Install only what you need:
# Core physics engine only (fastest, smallest footprint)
pip install mlsysim
# Install with the beautiful Terminal UI & YAML support
pip install "mlsysim[cli]"
# Install with dependencies for interactive labs (Marimo, Plotly)
pip install "mlsysim[labs]"
🐍 Python API Usage
The framework is just as powerful inside a Python script or Jupyter Notebook. The SystemEvaluator provides a clean, unified entry point for full-stack analysis:
import mlsysim
# 1. Define the scenario
model = mlsysim.Models.Language.Llama3_8B
hardware = mlsysim.Hardware.Cloud.H100
# 2. Run the evaluation
evaluation = mlsysim.SystemEvaluator.evaluate(
scenario_name="Llama-3 8B on H100",
model_obj=model,
hardware_obj=hardware,
batch_size=32,
precision="fp16",
efficiency=0.45
)
# 3. View the beautifully formatted scorecard
print(evaluation.scorecard())
Efficiency Parameter Guide
The efficiency parameter (0.0–1.0) captures the gap between peak hardware performance and what your software stack actually achieves. Use these guidelines:
| Scenario | Efficiency | Rationale |
|---|---|---|
| Training (Megatron-LM, large Transformer) | 0.40–0.55 | Well-optimized GEMM + FlashAttention |
| Training (PyTorch eager, small model) | 0.08–0.15 | Kernel launch overhead dominates |
| Inference decode, batch=1 | 0.01–0.05 | Memory-bound; compute nearly idle |
| Inference decode, batch=32+ | 0.15–0.35 | Batch amortizes weight loading |
| Inference prefill, long context | 0.30–0.50 | Compute-bound GEMM + attention |
| TinyML (TFLite Micro on ESP32) | 0.05–0.15 | Interpreter overhead, no tensor cores |
Contributors
Thanks to these wonderful people for helping improve MLSys·im!
Legend: 🪲 Bug Hunter · ⚡ Code Warrior · 📚 Documentation Hero · 🎨 Design Artist · 🧠 Idea Generator · 🔎 Code Reviewer · 🧪 Test Engineer · 🛠️ Tool Builder
Vijay Janapa Reddi 🧑💻 🎨 ✍️ 🧠 maintenance |
Peter Koellner 🪲 ✍️ |
Zeljko Hrcek 🧑💻 |
Rocky 🧑💻 |
Recognize a contributor: Comment on any issue or PR:
@all-contributors please add @username for code, doc, ideas, or bug
License
Code: Apache License 2.0 — free for commercial and non-commercial use, with patent grant and attribution requirement.
Documentation and textbook prose: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC-BY-NC-SA-4.0) — the tutorials and prose on mlsysbook.ai/mlsysim are part of the Machine Learning Systems textbook and carry its license.
The two licenses are intentionally separate: the Python package is permissively licensed so engineers and researchers can use it anywhere (including commercially), while the textbook prose retains its non-commercial protection to prevent republication as a derivative textbook.
Copyright © 2026 Vijay Janapa Reddi and MLSys·im contributors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlsysim-0.1.1.tar.gz.
File metadata
- Download URL: mlsysim-0.1.1.tar.gz
- Upload date:
- Size: 142.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa52d8dd870e08c09bf6cdeb97c11c4f839e57cf45dff1dd643d07052edb0b53
|
|
| MD5 |
778cccc73ff9d1705273c80736e1132e
|
|
| BLAKE2b-256 |
9434f3115de764bc01c653661a6683921b0d6d5e2a9fad1c8aef1c4ca6c4343d
|
Provenance
The following attestation bundles were made for mlsysim-0.1.1.tar.gz:
Publisher:
mlsysim-pypi-publish.yml on harvard-edge/cs249r_book
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlsysim-0.1.1.tar.gz -
Subject digest:
aa52d8dd870e08c09bf6cdeb97c11c4f839e57cf45dff1dd643d07052edb0b53 - Sigstore transparency entry: 1373409983
- Sigstore integration time:
-
Permalink:
harvard-edge/cs249r_book@2732a26740ce87cce5c0d68b1de4203170cf2745 -
Branch / Tag:
refs/tags/mlsysim-v0.1.1 - Owner: https://github.com/harvard-edge
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
mlsysim-pypi-publish.yml@2732a26740ce87cce5c0d68b1de4203170cf2745 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mlsysim-0.1.1-py3-none-any.whl.
File metadata
- Download URL: mlsysim-0.1.1-py3-none-any.whl
- Upload date:
- Size: 163.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c0f321ae2821d62984c0a1024674dabe6270a086c75680732a28044028cf7b9
|
|
| MD5 |
03ba3acd9c876968cfd7ab70b12f138b
|
|
| BLAKE2b-256 |
23a4e0010c2705a8b22e7cae37d4d7b6a5bf910eab468bbcd913b588b44257ba
|
Provenance
The following attestation bundles were made for mlsysim-0.1.1-py3-none-any.whl:
Publisher:
mlsysim-pypi-publish.yml on harvard-edge/cs249r_book
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlsysim-0.1.1-py3-none-any.whl -
Subject digest:
1c0f321ae2821d62984c0a1024674dabe6270a086c75680732a28044028cf7b9 - Sigstore transparency entry: 1373410049
- Sigstore integration time:
-
Permalink:
harvard-edge/cs249r_book@2732a26740ce87cce5c0d68b1de4203170cf2745 -
Branch / Tag:
refs/tags/mlsysim-v0.1.1 - Owner: https://github.com/harvard-edge
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
mlsysim-pypi-publish.yml@2732a26740ce87cce5c0d68b1de4203170cf2745 -
Trigger Event:
push
-
Statement type: