Skip to main content

Azure OpenAI PTU cost advisor plugin for az-scout — compare PAYGO, PTU, and hybrid deployment strategies

Project description

az-scout-plugin-aoai-ptu-advisor

Azure OpenAI PTU Cost Advisor — an az-scout plugin that compares PAYGO, PTU (Provisioned Throughput), and Hybrid deployment cost strategies for Azure OpenAI models.

What it does

Given your workload characteristics (average TPM, burst percentiles, usage hours), the plugin:

  • Estimates monthly costs for PAYGO, PTU on-demand, PTU reserved, and hybrid (PTU base + PAYGO spillover)
  • Sizes PTU capacity with correct minimum and increment rounding per model and deployment type
  • Detects burst patterns and recommends hybrid when steady base + bursty overflow is cheaper
  • Computes break-even TPM thresholds between PAYGO and PTU
  • Fetches live pricing from the Azure Retail Prices API with fallback to catalog defaults
  • Supports price overrides for enterprise/negotiated rates

Inspiration

This plugin is inspired by the public project azureptucalc by Ricardo Martins. The functional goals are the same — compare Azure OpenAI deployment cost strategies — but the implementation is completely native to az-scout:

  • No React — vanilla JavaScript frontend
  • No Vercel proxy — direct Azure Retail Prices API calls from the Python backend
  • No external dependencies — no database, no KQL, no cache layer
  • MCP tools — AI agents can call the calculator directly
  • Plugin architecture — auto-discovered by az-scout at startup

Installation

# Install the plugin (az-scout must be installed first)
uv pip install az-scout-plugin-aoai-ptu-advisor

# Or for development
cd az-scout-plugin-aoai-ptu-advisor
uv sync --group dev
uv pip install -e .

# Start az-scout — the plugin is auto-discovered
az-scout web

API Routes

All routes are under /plugins/aoai-ptu-advisor/.

GET /models

Returns supported Azure OpenAI models with PTU specifications.

GET /pricing?region=eastus&model=gpt-4o&deployment_type=Global

Returns resolved PAYGO and PTU pricing for a model/region/deployment type.

POST /analyze

Main analysis endpoint. Request body:

{
  "region": "eastus",
  "model": "gpt-4o",
  "deployment_type": "Global",
  "average_tpm": 50000,
  "p95_tpm": 80000,
  "p99_tpm": 120000,
  "input_ratio": 0.5,
  "hours_per_day": 24,
  "days_per_month": 30,
  "reserved_ptu": false,
  "spillover_enabled": true,
  "paygo_overrides": {
    "input_per_1m": 2.50,
    "output_per_1m": 10.00
  }
}

Response includes: cost estimates for each strategy, PTU sizing, break-even analysis, recommendation with rationale, warnings, and assumptions.

POST /analyze-from-timeseries

Accepts raw TPM time-series data (JSON array of {timestamp, tpm} objects), computes percentiles automatically, and runs the analysis.

{
  "region": "eastus",
  "model": "gpt-4o",
  "datapoints": [
    {"timestamp": "2025-01-01T00:00:00Z", "tpm": 50000},
    {"timestamp": "2025-01-01T01:00:00Z", "tpm": 120000},
    {"timestamp": "2025-01-01T02:00:00Z", "tpm": 30000}
  ]
}

MCP Tools

Available in the az-scout AI chat:

Tool Description
list_aoai_models List supported models with PTU specs
get_aoai_pricing Get resolved pricing for a model/region
analyze_aoai_ptu_cost Full cost comparison analysis

Supported Models

Model catalog is fetched dynamically from the Microsoft PTU onboarding documentation (cached 24h, static fallback on failure). Currently includes GPT-5.x, GPT-4.1, GPT-4o, o-series, DeepSeek, and Llama models.

Azure Auto-fill

The plugin can populate workload inputs directly from your Azure environment:

  1. Select your subscription and Azure OpenAI resource
  2. Click "Auto-fill from Azure"
  3. The plugin queries Azure Monitor Logs (KQL) and the ARM model capacity API
  4. Form fields are populated with live metrics (avg TPM, P99 TPM, recommended PTU)

Azure routes

Route Method Description
/azure/openai-resources GET List Azure OpenAI resources in a subscription
/azure/kql-autofill POST Query Azure Monitor Logs for TPM/PTU metrics
/azure/model-capacity POST Check available PTU capacity (preview API)
/azure/autofill-all POST Orchestrate KQL + capacity in one call

Required Azure permissions

  • Reader on the subscription (for resource discovery)
  • Log Analytics Reader on the Log Analytics workspace (for KQL queries)
  • Reader on the Cognitive Services account (for model capacity API)

Prerequisites for KQL autofill

Azure Monitor metrics must be exported to a Log Analytics workspace via diagnostic settings on the Azure OpenAI resource. If not configured, the plugin will return a clear warning.

Model capacity API

The capacity API (Microsoft.CognitiveServices/locations/{location}/modelCapacities) is a preview API (2024-04-01-preview). It returns location-based capacity data. The deployment type (Global/DataZone/Regional) is used as a UI filter over SKU names.

Limitations

  • Pricing accuracy: Live prices from the Azure Retail Prices API may not reflect negotiated enterprise rates. Use price overrides for accurate estimates.
  • No persistent storage: All calculations are stateless — no database or cache.
  • Throughput-per-PTU values: Fetched dynamically from Microsoft documentation; may change as models evolve.
  • KQL autofill: Requires diagnostic settings configured on the Azure OpenAI resource to export metrics to Log Analytics.
  • Capacity API: Preview API — results may be incomplete or change without notice.

Quality Checks

uv run ruff check src/ tests/
uv run ruff format --check src/ tests/
uv run mypy src/
uv run pytest

License

MIT

Disclaimer

This tool is not affiliated with Microsoft. Pricing, throughput, and capacity information are indicative and not a guarantee. PTU throughput-per-unit values are based on published documentation and may change. Always validate with actual Azure pricing and your account team for enterprise rates.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

az_scout_plugin_aoai_ptu_advisor-2026.3.31.tar.gz (157.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file az_scout_plugin_aoai_ptu_advisor-2026.3.31.tar.gz.

File metadata

File hashes

Hashes for az_scout_plugin_aoai_ptu_advisor-2026.3.31.tar.gz
Algorithm Hash digest
SHA256 95cc5de2a5a29fb49b607c274a995740c73b6adb2a3dd156a2d752caa4244106
MD5 0f4c196b403e887adc0e2100cdac63d2
BLAKE2b-256 499a3a387f1447f2d168fa988e7fa098809f8d05d45e31612ec7cc9f79f23d4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for az_scout_plugin_aoai_ptu_advisor-2026.3.31.tar.gz:

Publisher: publish.yml on az-scout/az-scout-plugin-aoai-ptu-advisor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file az_scout_plugin_aoai_ptu_advisor-2026.3.31-py3-none-any.whl.

File metadata

File hashes

Hashes for az_scout_plugin_aoai_ptu_advisor-2026.3.31-py3-none-any.whl
Algorithm Hash digest
SHA256 d6b80d2da08fc152df28c3930eaf903b8fe61a1373d38efb16e35b3143a85ef6
MD5 695c870543d0e138493ef98ed3efead2
BLAKE2b-256 1976fcc5715f4b62552ee66ac754da7d05823c0dda920c93a9bac583a6d2fbc8

See more details on using hashes here.

Provenance

The following attestation bundles were made for az_scout_plugin_aoai_ptu_advisor-2026.3.31-py3-none-any.whl:

Publisher: publish.yml on az-scout/az-scout-plugin-aoai-ptu-advisor

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page