Skip to main content

Energy observability for AI workloads

Project description

matcha

Energy observability for AI workloads

PyPI version PyPI Downloads License

Measure GPU energy consumption of any training run. Zero overhead. Zero code changes.


Install

pip install usematcha

Requires an NVIDIA GPU with drivers installed.

Quick Start

Prefix your training command with matcha run:

matcha run torchrun --standalone --nproc_per_node=1 train_gpt.py

Your training runs at full speed. Matcha appends one line at the end:

matcha_energy gpus:NVIDIA H100 80GB HBM3 total:364722J (101.31Wh) duration:746.0s avg_power:489W peak_power:700W samples:7449

No code changes. No config files. Works with any training script.

Commands

matcha run - Total energy, zero overhead

Launches your command, polls GPU power in the background, prints a summary when it finishes. Your training runs natively - no stdout interception, no performance impact.

matcha run python train.py
matcha run torchrun --standalone --nproc_per_node=1 train_gpt.py
matcha run deepspeed --num_gpus=4 train.py --deepspeed ds_config.json

matcha wrap - Per-step energy breakdown

Parses stdout for step markers (step 10, iter 10, step:10/1000, [10/1000], etc.) and appends energy data to each step line.

matcha wrap torchrun --standalone --nproc_per_node=1 train_gpt.py

Output:

step:1/20000 train_loss:6.9357 train_time:438ms step_avg:438.01ms energy:106.7J/step avg_power:354W peak_power:427W
step:2/20000 train_loss:16.7414 train_time:833ms step_avg:416.60ms energy:154.0J/step avg_power:508W peak_power:533W
step:3/20000 train_loss:8.7524 train_time:1258ms step_avg:419.23ms energy:221.8J/step avg_power:551W peak_power:565W
...
matcha_energy gpus:NVIDIA H100 80GB HBM3 total:97271J (27.02Wh) duration:202.9s avg_power:479W peak_power:701W samples:2025

matcha monitor - Live GPU power

matcha monitor
matcha monitor --gpus 0 --window 2.0

Multi-GPU

Matcha auto-detects all GPUs and sums power across them. No flags needed.

# 8xH100 - automatically polls all 8 GPUs
matcha run torchrun --standalone --nproc_per_node=8 train_gpt.py

# Specific GPUs only
matcha run --gpus 0,1,2,3 torchrun ...

# Single GPU
matcha run --gpus 0 torchrun ...

How It Works

Matcha runs a background thread that polls GPU power via NVML at 100ms intervals. Energy is computed using trapezoidal integration of instantaneous power readings. Your training process runs natively - Matcha never touches your stdout, your model, or your training loop.

Tested On

  • NVIDIA H100 80GB HBM3 - verified zero overhead across 4 benchmark modes
  • Works with torchrun, deepspeed, accelerate, or plain python
  • Compatible with PyTorch and any framework that runs on NVIDIA GPUs

Why

10-minute H100 training run:
  Energy cost:   $0.01 (101 Wh @ $0.12/kWh)
  Compute cost:  $0.48 (RunPod @ $2.90/hr)

  → Compute is 48x the energy cost
  → Optimizing energy/step = faster training = less rental time

Built by

Keeya Labs · Docs

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usematcha-0.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

usematcha-0.1.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file usematcha-0.1.0.tar.gz.

File metadata

  • Download URL: usematcha-0.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for usematcha-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1b3d19d50a6b0ec1535aa6a0b56c62245e1410d11ddc406c310799ae53eadc2f
MD5 d1fda026d05fd02c261e88633e4ff6ab
BLAKE2b-256 b7eb5733f4c119bc94998fd369b0358e6bf7e33bb93f194025c51cdd70b14609

See more details on using hashes here.

File details

Details for the file usematcha-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: usematcha-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for usematcha-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4c0e1bdd0c67d48853b126555e8739d25ee64e87824bb6ee15f98513f0fca7f4
MD5 770366bac5dde14ae974785611122b13
BLAKE2b-256 8965a49bb92d22f0397b7704531727f4a40e3117cc156a806a80beb533746620

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page