aluminatiai

GPU energy monitoring agent — per-job cost attribution and energy-efficient fine-tuning for AI teams

These details have not been verified by PyPI

Project links

Project description

AluminatAI
GPU Energy Monitoring & Energy-Efficient LLM Fine-Tuning

Open-source Python agent that monitors GPU power consumption, attributes energy costs to individual jobs, and optimizes LLM fine-tuning for minimum Joules-per-token.

Works on NVIDIA, AMD (ROCm), Intel Gaudi, Intel Arc, Apple Silicon, and CPU-only (RAPL) machines.

Install

pip install aluminatiai                # GPU monitoring agent
pip install aluminatiai[finetune]      # + QLoRA training with energy tracking
pip install aluminatiai[greentune]     # everything

What It Does

Capability	Description
GPU Monitoring	Power, temperature, utilization sampled every 5s, attributed to jobs, streamed to dashboard
Cost Attribution	Per-job energy costs across multi-tenant GPU clusters (Slurm, K8s, Run:ai)
GreenTune	Energy-efficient QLoRA fine-tuning with real AMD MI300X telemetry
Swarm Optimizer	Offline hyperparameter search that minimizes J/token — no API keys needed
Lobster Trap	Energy governance: carbon budget, efficiency floor, cost guard per training run
Prometheus	`/metrics` endpoint with GPU power, energy, attribution, and upload health gauges

GreenTune — Energy-Efficient Fine-Tuning

GreenTune tracks real-time power consumption during LLM fine-tuning and optimizes hyperparameters to minimize energy waste. Built for AMD MI300X (192GB HBM3, 750W TDP) with ROCm, also works on NVIDIA GPUs.

Swarm Optimizer (no API key needed)

aluminatiai swarm --max-samples 500

Runs an exhaustive grid search over batch size, gradient accumulation, and LoRA rank. Projects energy for each config, enforces Lobster Trap policies, and ranks by J/token efficiency.

┏━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━┓
┃ # ┃ Batch Size┃ Grad Accum┃ LoRA Rank┃ J/tok  ┃ CO2 (g) ┃ Cost    ┃ Duration ┃
┡━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━┩
│ 1 │ 32        │ 8         │ 8        │ 0.0265 │ 0.74    │ $0.0002 │ 0.2 min  │
│ 2 │ 32        │ 8         │ 16       │ 0.0271 │ 0.75    │ $0.0002 │ 0.2 min  │
│ 3 │ 32        │ 8         │ 32       │ 0.0284 │ 0.79    │ $0.0002 │ 0.2 min  │
│ 4 │ 16        │ 8         │ 8        │ 0.0291 │ 0.81    │ $0.0002 │ 0.2 min  │
│ 5 │ 16        │ 8         │ 16       │ 0.0304 │ 0.84    │ $0.0003 │ 0.2 min  │
└───┴───────────┴───────────┴──────────┴────────┴─────────┴─────────┴──────────┘

EnergyCallback — Drop Into Any HuggingFace Trainer

from aluminatiai.finetune import EnergyCallback

trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
    callbacks=[EnergyCallback(gpu_index=0)],
)
trainer.train()

Tracks per-step power draw, Joules-per-token, cumulative energy, CO2 emissions, and cost. Outputs a full energy report at the end of training.

Train with Live Dashboard Upload

aluminatiai train \
  --hermes-only --hermes-max 500 \
  --batch-size 4 --grad-accum 4 \
  --lora-rank 16 --epochs 1 \
  --api-url https://www.aluminatiai.com \
  --api-key alum_your_key_here \
  --run-name "My Training Run"

Lobster Trap — Energy Governance

Every training config is checked against four policies before it runs:

Policy	Limit	What it enforces
`carbon_budget`	50g CO2	Max carbon emissions per run
`energy_cap`	1 kWh	Max total energy per run
`efficiency_floor`	0.8 J/tok	Max joules per token
`cost_guard`	$1.00	Max energy cost per run

Python API

from aluminatiai.finetune import GreenTuneSwarm

swarm = GreenTuneSwarm()
result = swarm.optimize("Minimize J/token for Qwen2.5-7B")

print(result["recommendation"])
# {'batch_size': 32, 'grad_accum': 8, 'lora_rank': 8, 'projected_jpt': 0.0265, ...}

GPU Monitoring Agent

Quick Start

export ALUMINATAI_API_KEY=alum_your_key_here
aluminatiai

Get your API key at aluminatiai.com/dashboard. The agent detects your GPU, starts sampling, and uploads metrics. That's it.

Supported Hardware

Backend	GPUs	Primary SDK	Fallback
NVIDIA	A100, H100, H200, L40S, RTX 4090, T4, V100	`nvidia-ml-py` (NVML)	—
AMD	MI300X, MI300A, MI325X, MI250X, MI210, MI100	`amdsmi`	`rocm-smi`
Intel Gaudi	Gaudi, Gaudi2, Gaudi3	`pyhlml` (SynapseAI)	`hl-smi`
Intel Arc	A770, A750, B580, Flex 170, Max 1550	`xpu-smi` (oneAPI)	hwmon sysfs
Apple Silicon	M1–M5 Pro/Max/Ultra	`powermetrics` (sudo)	`ioreg`
CPU-only	Any x86 (Intel/AMD)	RAPL sysfs	—

Auto-detected at startup. No configuration needed.

Product Tiers

Tier	Mode	What it does
Monitor	Default	Read-only metrics, cost attribution, Prometheus, carbon tracking
Advisor	Opt-in	Recommendations with approval workflows: "GPU 3 is 40% idle — cap to 200W?"
Swarm	Opt-in	Autonomous fleet-wide optimization: power capping, thermal balancing, carbon-aware scheduling

aluminatiai                                                        # Monitor
AUTO_TUNE_ENABLED=1 COMMAND_POLL_ENABLED=1 aluminatiai             # Advisor
SWARM_ENABLED=1 COMMAND_POLL_ENABLED=1 AUTO_TUNE_ENABLED=1 aluminatiai  # Swarm

CLI Reference

Command	Description
`aluminatiai run`	Main daemon — collect, attribute, upload (default)
`aluminatiai train`	GreenTune QLoRA fine-tuning with energy tracking
`aluminatiai swarm`	Hyperparameter optimizer (offline, no API keys)
`aluminatiai benchmark`	GPU power baseline and efficiency measurement
`aluminatiai optimize`	Real-time efficiency analysis with recommendations
`aluminatiai ab`	A/B test energy efficiency between configs
`aluminatiai carbon-schedule`	Find lowest-carbon window for a job
`aluminatiai report`	Generate chargeback reports (CSV/HTML/JSON)
`aluminatiai query`	Query local SQLite time-series store
`aluminatiai recommend`	GPU recommender — rank GPUs by efficiency and cost

aluminatiai run

aluminatiai                            # run forever (default)
aluminatiai --interval 2               # sample every 2 seconds
aluminatiai --duration 3600            # run for 1 hour then exit
aluminatiai --dry-run                  # collect + attribute, skip uploads
aluminatiai --prometheus-only          # local Prometheus only, no cloud

aluminatiai train

aluminatiai train --hermes-only --hermes-max 500 --batch-size 4
aluminatiai train --model Qwen/Qwen2.5-7B-Instruct --epochs 3
aluminatiai train --lora-rank 8 --batch-size 8       # faster, less quality
aluminatiai train --eval                              # run eval after training

aluminatiai swarm

aluminatiai swarm                                     # default search space
aluminatiai swarm --max-samples 500 --model Qwen/Qwen2.5-7B
aluminatiai swarm --batch-sizes 1,2,4,8,16,32         # custom search
aluminatiai swarm --lora-ranks 8,16,32,64             # custom LoRA ranks
aluminatiai swarm --json                              # JSON output for automation
aluminatiai swarm --output results.json               # save to file

aluminatiai benchmark

aluminatiai benchmark                              # 60s power baseline
aluminatiai benchmark --gpu 0 --duration 120       # specific GPU, 2 min
aluminatiai benchmark --upload                     # submit to Green AI Index

Job Attribution

The agent attributes GPU power to individual jobs using a 7-step resolution pipeline:

Priority	Method	Confidence	Source
1	`ALUMINATAI_TEAM` env var	1.00	Explicit user tag
2	Scheduler env vars	0.90	`SLURM_JOB_ID`, `RUNAI_JOB_NAME`, K8s pod UID
3	Scheduler poll	0.75	`gpu_to_job()` query
4	Custom rules file	0.60	JSON regex patterns
5	Cmdline heuristics	0.40	Built-in patterns (jupyter, vllm, torchserve, ollama)
6	Memory split	0.20	Power split by GPU memory usage
7	Idle attribution	0.30	`ALUMINATAI_IDLE_TEAM` fallback

# Tag your workload
ALUMINATAI_TEAM=nlp-team ALUMINATAI_MODEL=llama3-finetune python train.py

ML Framework Integrations

MLflow

from aluminatiai.integrations.mlflow_callback import AluminatiMLflowCallback
trainer.add_callback(AluminatiMLflowCallback())

Weights & Biases

from aluminatiai.integrations.wandb_callback import AluminatiWandbCallback
trainer.add_callback(AluminatiWandbCallback())

OpenTelemetry

from aluminatiai.integrations.otel_exporter import AluminatiOtelExporter
exporter = AluminatiOtelExporter()

Prometheus Metrics

Default port 9100. Key metrics:

Metric	Type	Description
`aluminatai_gpu_power_watts`	Gauge	Current power per GPU
`aluminatai_gpu_energy_joules_total`	Counter	Cumulative energy per GPU
`aluminatai_gpu_utilization_pct`	Gauge	Compute utilization
`aluminatai_gpu_temperature_c`	Gauge	Temperature
`aluminatai_upload_success_total`	Counter	Successful uploads
`aluminatai_attribution_confidence`	Gauge	Attribution confidence (0–1)

scrape_configs:
  - job_name: aluminatiai
    static_configs:
      - targets: ['gpu-host:9100']

Deployment

One-line install (Linux + systemd)

curl -sSL https://get.aluminatiai.com | bash

Docker (NVIDIA)

docker run --rm --runtime=nvidia --pid=host \
  -e ALUMINATAI_API_KEY=alum_your_key_here \
  ghcr.io/agentmulder404/aluminatai-agent:latest

Kubernetes DaemonSet

kubectl apply -f deploy/k8s/daemonset.yaml

Configuration

Settings are read in priority order: env vars > config file > defaults.

aluminatiai --config /etc/aluminatai.json

Full configuration reference

API & Upload

Env var	Default	Description
`ALUMINATAI_API_KEY`	(required)	Your API key
`ALUMINATAI_API_ENDPOINT`	`https://…/v1/metrics/ingest`	Ingest endpoint
`UPLOAD_INTERVAL`	`60`	Seconds between flushes
`UPLOAD_BATCH_SIZE`	`100`	Metrics per request

Sampling

Env var	Default	Description
`SAMPLE_INTERVAL`	`5.0`	Seconds between GPU samples

Advisor Tier

Env var	Default	Description
`AUTO_TUNE_ENABLED`	`false`	Enable optimization recommendations
`COMMAND_POLL_ENABLED`	`false`	Enable polling for approved commands

Swarm Tier

Env var	Default	Description
`SWARM_ENABLED`	`false`	Enable fleet-wide optimization
`SWARM_EVAL_INTERVAL`	`300`	Seconds between fleet evaluations

Built-in fleet policies: idle_gpu_power_cap, thermal_balancing, carbon_aware_fleet_cap, fleet_gpu_rightsizing.

Safety: max 25% fleet blast radius, canary ramp-up, leader election, adaptive polling.

Prometheus

Env var	Default	Description
`METRICS_PORT`	`9100`	Scrape port (0 = disabled)
`METRICS_BASIC_AUTH`	(none)	`user:pass` for HTTP Basic Auth

Security

Env var	Default	Description
`OFFLINE_MODE`	`false`	WAL only, no HTTP uploads
`ALUMINATAI_CA_BUNDLE`	(none)	Custom CA PEM path
`ALUMINATAI_CLIENT_CERT`	(none)	mTLS client cert

Package Structure

aluminatiai/
├── agent.py              # Main daemon
├── cli.py                # CLI router (run, train, swarm, benchmark, ...)
├── collector.py          # NVIDIA GPU collector (NVML)
├── amd_collector.py      # AMD GPU collector (amdsmi / rocm-smi)
├── gaudi_collector.py    # Intel Gaudi collector
├── intel_arc_collector.py# Intel Arc collector
├── apple_collector.py    # Apple Silicon collector
├── rapl_collector.py     # CPU-only RAPL collector
├── uploader.py           # HTTPS upload + WAL + backoff
├── metrics_server.py     # Prometheus /metrics endpoint
├── attribution/          # 7-step job attribution engine
├── schedulers/           # Slurm, K8s, Run:ai adapters
├── integrations/         # MLflow, W&B, OpenTelemetry callbacks
├── efficiency/           # Energy analysis, carbon scheduling, roofline
├── swarm/                # Fleet-wide optimization (leader election, policies)
├── finetune/             # GreenTune — energy-efficient fine-tuning
│   ├── greentune.py      # QLoRA training with energy tracking
│   ├── greentune_swarm.py# Offline hyperparameter optimizer
│   ├── energy_callback.py# HuggingFace TrainerCallback for energy metrics
│   ├── rocm_power.py     # AMD GPU power monitoring (amdsmi / rocm-smi)
│   └── dataset_builder.py# Synthetic dataset generation via Claude
└── tests/

Development

git clone https://github.com/AgentMulder404/aluminatiai.git
cd aluminatiai
pip install -e ".[all]"
python -m pytest tests/ -v

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

May 20, 2026

0.3.0

May 20, 2026

0.2.1

Mar 8, 2026

0.2.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aluminatiai-0.3.1.tar.gz (249.9 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aluminatiai-0.3.1-py3-none-any.whl (265.4 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file aluminatiai-0.3.1.tar.gz.

File metadata

Download URL: aluminatiai-0.3.1.tar.gz
Upload date: May 20, 2026
Size: 249.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for aluminatiai-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`b82612dec9b0f4e76196bf462c9f4edcb54c051fe20b676bcbbca6c4cd13498e`
MD5	`70b657f22395d1f957a0691d9a386573`
BLAKE2b-256	`28d398aa33d22da0fa9f961b6c40b427141bf017348c26aa3687826725df8492`

See more details on using hashes here.

File details

Details for the file aluminatiai-0.3.1-py3-none-any.whl.

File metadata

Download URL: aluminatiai-0.3.1-py3-none-any.whl
Upload date: May 20, 2026
Size: 265.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for aluminatiai-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1ec334d220fdc54b9e249a9eb000d38a49e594e0bc7011f407c16b71c3b4b9da`
MD5	`8e1615777cf786f064aeb0f455ef8297`
BLAKE2b-256	`209bedb9641ede4dbeb7707556d5883c06556224e7249cf72548a0692bad4a7a`

See more details on using hashes here.

aluminatiai 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Install

What It Does

GreenTune — Energy-Efficient Fine-Tuning

Swarm Optimizer (no API key needed)

EnergyCallback — Drop Into Any HuggingFace Trainer

Train with Live Dashboard Upload

Lobster Trap — Energy Governance

Python API

GPU Monitoring Agent

Quick Start

Supported Hardware

Product Tiers

CLI Reference

aluminatiai run

aluminatiai train

aluminatiai swarm

aluminatiai benchmark

Job Attribution

ML Framework Integrations

MLflow

Weights & Biases

OpenTelemetry

Prometheus Metrics

Deployment

One-line install (Linux + systemd)

Docker (NVIDIA)

Kubernetes DaemonSet

Configuration

API & Upload

Sampling

Advisor Tier

Swarm Tier

Prometheus

Security

Package Structure

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes