GPU energy observability for AI training workloads
Project description
GPU energy observability for AI training.
Measure energy per training run and per step — from NVML's hardware counter, not sampled power. Zero-code CLI, Python API, and HuggingFace Trainer callback. Structured output for any observability stack.
Install
pip install usematcha
Linux, Python 3.9+, NVIDIA GPU with drivers installed.
Quickstart
matcha run torchrun --standalone --nproc_per_node=8 train_gpt.py
matcha_energy gpus:8x NVIDIA H100 80GB HBM3 total:778168J (216.16Wh) duration:203.1s avg_power:3832W peak_power:4120W samples:2031
No code changes. No config files. Works with any training script.
Three ways to use it
matcha exposes one measurement engine through three surfaces. All three read the same NVML hardware counter and emit the same StepResult / SessionResult shape.
CLI — zero-code, wraps any training command.
matcha run python train.py # total energy
matcha wrap python train.py # per-step energy
matcha monitor # live dashboard
See docs/playbooks/cli for diff, JSONL output, and multi-run comparison.
Python API — opt-in, for framework integrations and notebook work.
import matcha
with matcha.session() as s:
for i in range(num_steps):
with s.step(i):
train_step()
print(s.result.total_energy_j, s.result.energy_wh)
See docs/playbooks/python-api for explicit lifecycle, custom metrics, and multi-GPU details.
HuggingFace Trainer callback — drop-in for the Trainer loop.
from matcha.callbacks import StepEnergyCallback
trainer = Trainer(model=model, args=args, callbacks=[StepEnergyCallback()])
trainer.train()
Per-step energy flows into the Trainer's log dict — visible in stdout, TensorBoard, and WandB automatically. Install with pip install 'usematcha[hf]'.
See docs/playbooks/huggingface for DDP, failure modes, and config.
Observability
Structured output plugs into the stack you already have.
- JSONL —
--output run.jsonlwritessession_start/step/session_endrecords with per-GPU breakdowns. Stream into ClickHouse, DuckDB, or any log pipeline. - Prometheus —
--prometheus :9400exposes a/metricsendpoint with step-level and GPU-live gauges, plus training metrics auto-extracted from stdout. - OpenTelemetry —
--otlp URLpushes the same metric set to Grafana Cloud, Honeycomb, Datadog, or any OTel collector. Install withpip install 'usematcha[otlp]'.
Metric names match across Prometheus and OTLP so dashboards port between deployments.
Multi-GPU
matcha auto-detects every visible GPU and reports summed totals plus a per-GPU breakdown in every record. The per-GPU arrays make straggler detection a one-query affair — one rank consistently drawing ~30% less power usually means a stuck collective, a thermally throttled card, or a PCIe link degraded to Gen3.
matcha run --gpus 0,1,2,3 torchrun ...
How it works
matcha reads energy directly from NVML's hardware accumulator (nvmlDeviceGetTotalEnergyConsumption, Volta+). Per-step and session energy are exact counter deltas — millijoule-precise, no integration error. A background poller plus boundary reads at each step transition track peak power. Pre-Volta GPUs fall back to trapezoidal integration. Training runs natively; matcha never touches your model or training loop.
Full design in ARCHITECTURE.md.
Documentation · Changelog · Architecture · Contributing · Security
Built by Keeya Labs. Apache 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file usematcha-0.3.1.tar.gz.
File metadata
- Download URL: usematcha-0.3.1.tar.gz
- Upload date:
- Size: 32.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbf15b8468b36dd191a5a0cb9216925876e1550ad00ea0e4c5eab3b3a2625f93
|
|
| MD5 |
d5899b93067f2f3e37e56a64b08f4fd0
|
|
| BLAKE2b-256 |
ac1283375cd8ecf47d186f6f33b0d29aeeb3e59f9bb7dd766bb7ca1425dad195
|
File details
Details for the file usematcha-0.3.1-py3-none-any.whl.
File metadata
- Download URL: usematcha-0.3.1-py3-none-any.whl
- Upload date:
- Size: 36.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f3cc298e7cc6376ffee01541d679e758b848c83486047961a8000db695bfa61
|
|
| MD5 |
31e509601d8fb1da90081721fe05e576
|
|
| BLAKE2b-256 |
b549670c9175b27ab60df25920ac192f27b5d322101349338939a856ad0bf5a3
|