Energy observability for AI workloads
Project description
matcha
Energy observability for AI workloads
Measure GPU energy consumption of any training run. Zero overhead. Zero code changes.
Install
pip install usematcha
Requires an NVIDIA GPU with drivers installed.
Quick Start
Prefix your training command with matcha run:
matcha run torchrun --standalone --nproc_per_node=1 train_gpt.py
Your training runs at full speed. Matcha appends one line at the end:
matcha_energy gpus:NVIDIA H100 80GB HBM3 total:364722J (101.31Wh) duration:746.0s avg_power:489W peak_power:700W samples:7449
No code changes. No config files. Works with any training script.
Commands
matcha run - Total energy, zero overhead
Launches your command, polls GPU power in the background, prints a summary when it finishes. Your training runs natively - no stdout interception, no performance impact.
matcha run python train.py
matcha run torchrun --standalone --nproc_per_node=1 train_gpt.py
matcha run deepspeed --num_gpus=4 train.py --deepspeed ds_config.json
matcha wrap - Per-step energy breakdown
Parses stdout for step markers (step 10, iter 10, step:10/1000, [10/1000], etc.) and appends energy data to each step line.
matcha wrap torchrun --standalone --nproc_per_node=1 train_gpt.py
Output:
step:1/20000 train_loss:6.9357 train_time:438ms step_avg:438.01ms energy:106.7J/step avg_power:354W peak_power:427W
step:2/20000 train_loss:16.7414 train_time:833ms step_avg:416.60ms energy:154.0J/step avg_power:508W peak_power:533W
step:3/20000 train_loss:8.7524 train_time:1258ms step_avg:419.23ms energy:221.8J/step avg_power:551W peak_power:565W
...
matcha_energy gpus:NVIDIA H100 80GB HBM3 total:97271J (27.02Wh) duration:202.9s avg_power:479W peak_power:701W samples:2025
matcha monitor - Live GPU power
matcha monitor
matcha monitor --gpus 0 --window 2.0
Multi-GPU
Matcha auto-detects all GPUs and sums power across them. No flags needed.
# 8xH100 - automatically polls all 8 GPUs
matcha run torchrun --standalone --nproc_per_node=8 train_gpt.py
# Specific GPUs only
matcha run --gpus 0,1,2,3 torchrun ...
# Single GPU
matcha run --gpus 0 torchrun ...
How It Works
Matcha runs a background thread that polls GPU power via NVML at 100ms intervals. Energy is computed using trapezoidal integration of instantaneous power readings. Your training process runs natively - Matcha never touches your stdout, your model, or your training loop.
Tested On
- NVIDIA H100 80GB HBM3 - verified zero overhead across 4 benchmark modes
- Works with
torchrun,deepspeed,accelerate, or plainpython - Compatible with PyTorch and any framework that runs on NVIDIA GPUs
Why
10-minute H100 training run:
Energy cost: $0.01 (101 Wh @ $0.12/kWh)
Compute cost: $0.48 (RunPod @ $2.90/hr)
→ Compute is 48x the energy cost
→ Optimizing energy/step = faster training = less rental time
Built by
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file usematcha-0.1.0.tar.gz.
File metadata
- Download URL: usematcha-0.1.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b3d19d50a6b0ec1535aa6a0b56c62245e1410d11ddc406c310799ae53eadc2f
|
|
| MD5 |
d1fda026d05fd02c261e88633e4ff6ab
|
|
| BLAKE2b-256 |
b7eb5733f4c119bc94998fd369b0358e6bf7e33bb93f194025c51cdd70b14609
|
File details
Details for the file usematcha-0.1.0-py3-none-any.whl.
File metadata
- Download URL: usematcha-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4c0e1bdd0c67d48853b126555e8739d25ee64e87824bb6ee15f98513f0fca7f4
|
|
| MD5 |
770366bac5dde14ae974785611122b13
|
|
| BLAKE2b-256 |
8965a49bb92d22f0397b7704531727f4a40e3117cc156a806a80beb533746620
|