Optimization SDK, CLI, and API for Agnitra's pay-per-optimization platform.

These details have not been verified by PyPI

Project links

Project description

agnitraai

Agnitra is an end-to-end optimization platform that wraps model tuning, telemetry, and metered billing into a single developer flow. The SDK and CLI make agnitra.optimize(model) feel instantaneous while the control plane meters GPU hours for a usage-based SaaS model.

Highlights

Unified CLI (agnitra) and Python SDK for profiling, tuning, and exporting optimized TorchScript artifacts.
Runtime optimization agent couples dynamic kernel injection with usage metering (RuntimeOptimizationAgent + UsageMeter).
Telemetry collectors, LLM-guided kernel suggestions, and RL-backed refinements.
Usage-based SaaS pipeline that links telemetry → usage logs → Stripe metered billing with pay-per-optimization pricing.

Installation

Python (PyPI)

From wheel (recommended)

pip install agnitra

Inject a custom UsageMeter if you need different pricing, and inspect result.baseline / result.optimized snapshots for telemetry, GPU usage, and billing metadata that can be forwarded to your control plane.

Rebuild the wheel from source when iterating locally:

python -m build --wheel
python -m pip install --force-reinstall --no-index --find-links dist agnitra

From source

pip install -e .[openai,rl]

Optional extras:

agnitra[openai] → OpenAI Responses API client.
agnitra[rl] → PPO tuning via Stable Baselines3 + Gymnasium.
agnitra[nvml] → GPU telemetry using NVML.
agnitra[marketplace] → Cloud marketplace adapters (boto3, httpx, google-auth).

JavaScript / TypeScript (npm)

Install the JavaScript SDK to call the Agentic Optimization API or submit usage events from Node.js services:

npm install agnitra

See js/README.md for a TypeScript quick start, async queue helpers, and usage reporting examples.

Quick Start

1. Watch the walkthrough

launch_demo.mp4 – short narrated slides covering the milestone demo.

2. Run the milestone script

python demo.py --sample-shape 1,16,64

The script performs three sequential demos:

Segment	What it shows
Baseline vs Optimized	Runs `RuntimeOptimizationAgent` (`agnitra.optimize`) on the TinyLlama fixture and reports latency + billing uplift.
CLI Optimization	Executes `agnitra optimize --model tinyllama.pt` and shows the pay-per-optimization summary before saving the artifact.
Kernel Injection	Generates a Triton kernel and swaps an FX node via `RuntimePatcher`.

Each segment emits a structured usage event. The CLI mirrors the SDK output, printing tokens/sec uplift, GPU hours saved, and the metered charge so teams can verify billing before rollout.

3. CLI cheatsheet

agnitra --help
agnitra optimize --model tinyllama.pt --input-shape 1,16,64

3.5 Agentic Optimization API

Launch the Starlette service:
```
agnitra-api --host 127.0.0.1 --port 8080
```
(equivalent to uvicorn agnitra.api.app:create_app).
Call the endpoint with graph + telemetry artifacts:
```
curl -X POST http://127.0.0.1:8080/optimize \
  -F model_graph=@graph_ir.json \
  -F telemetry=@telemetry.json \
  -F target=A100
```
The JSON response includes an optimized IR graph, generated Triton kernel source, and FX patch instructions.
For JSON payloads, send {"target": "...", "model_graph": [...], "telemetry": {...}} with Content-Type: application/json.

4. Marketplace Usage Endpoint

The API now exposes POST /usage, a marketplace-compatible billing hook that accepts baseline/optimized telemetry or a precomputed UsageEvent. The endpoint returns the normalised usage payload alongside dispatch results for AWS, GCP, and Azure marketplace adapters.

curl -X POST http://127.0.0.1:8080/usage \
  -H "Content-Type: application/json" \
  -d '{
        "project_id": "demo-project",
        "model_name": "tinyllama",
        "baseline": {"latency_ms": 120, "tokens_per_sec": 90, "tokens_processed": 2048},
        "optimized": {"latency_ms": 80, "tokens_per_sec": 140, "tokens_processed": 2048},
        "tokens_processed": 2048,
        "providers": ["aws", "gcp"]
      }'

When marketplace credentials or SDKs are not present the adapters respond with status: "skipped" or status: "deferred" so that the control plane can retry.

4. SDK in your code

import torch
from agnitra import optimize

model = torch.jit.load("tinyllama.pt")
sample = torch.randn(1, 16, 64)

result = optimize(
    model,
    input_tensor=sample,
    enable_rl=False,
    project_id="demo",
)
optimized = result.optimized_model
usage_event = result.usage_event
print(f"GPU hours saved: {usage_event.gpu_hours_saved:.6f}, billable: {usage_event.total_billable:.4f} {usage_event.currency}")

Usage-Based SaaS Architecture

The repository tracks the implementation plan for a pay-per-optimization product. Key building blocks:

Runtime Agent – agnitra.core.runtime.agent.RuntimeOptimizationAgent intercepts CUDA/ROCm/Triton workloads, applies runtime patches, and records tokens/sec, latency, and GPU utilisation before/after optimization.
Telemetry + Metering – UsageMeter converts those snapshots into GPU-hour, cost-savings, and billable records that the CLI/SDK emit as structured usage events.
Control Plane – FastAPI (REST) + gRPC fronting an Optimize() endpoint. Async workers aggregate usage, enrich with cost data, and call Stripe Metered Billing. Webhooks reconcile invoices with project owners.
Billing Loop – Stripe metered usage records keyed by project ID + region + tag. Usage snapshots are bundled into invoices. Saved reports highlight cost savings vs baseline compute spend.
Developer Surface – function wrapper (agnitra.optimize), context manager (optimize_ctx), and decorator (agnitra_step) so optimisation happens automatically.

Repository Layout (Monorepo blueprint)

agnitra/
├─ sdk/python/
│  ├─ agnitra/
│  │  ├─ optimize.py        # optimize(), optimize_ctx, agnitra_step
│  │  ├─ passes/            # pluggable optimization passes
│  │  ├─ backends/          # torch/tf/jax adapters
│  │  ├─ telemetry.py       # usage events buffer + signer
│  │  ├─ auth.py            # session token management
│  │  ├─ config.py          # Config dataclass
│  │  └─ cli.py             # Click CLI wiring
│  └─ pyproject.toml
├─ control-plane/
│  ├─ api/                  # REST/gRPC services
│  ├─ metering/             # aggregation, rating, invoicing
│  ├─ billing/              # Stripe/Paddle adapters + webhooks
│  └─ db/                   # migrations for usage tables
└─ infra/                   # Docker, Helm, Terraform manifests

Metering Flow

SDK attaches to a model – emits a usage.attach event with tags (model, env, region).
Runtime agent records baseline + optimized telemetry (latency, tokens/sec, GPU utilisation).
Control plane ingests events, aggregates GPU hours optimised, and calculates uplift.
Stripe metered billing rates GPU hours / tokens and issues invoices.
Dashboard (future) visualises savings and lets teams approve optimisations before rollout.

Deployment & Marketplace Integration

Docker – Build a containerised runtime with docker build -t agnitra-marketplace . and run it using docker run -p 8080:8080 agnitra-marketplace.
Helm – deploy/helm/agnitra-marketplace packages the API for Kubernetes, exposing configuration for marketplace credentials, autoscaling, and ingress.
Terraform – Turn-key modules exist for AWS Fargate (deploy/terraform/aws_marketplace), Google Cloud Run (deploy/terraform/gcp_marketplace), and Azure Container Apps (deploy/terraform/azure_marketplace). Each module outputs a ready-to-register /usage endpoint.
CloudFormation – deploy/cloudformation/aws-marketplace.yaml offers an AWS-native template for rapid provisioning without Terraform.

Register the emitted /usage URL with the respective marketplace listing so that usage events flow into the provider-managed billing pipeline.

Profiling & Visualisation

The classic profiling flow remains available:

Profile a model: python -m agnitra.cli profile tinyllama.pt --input-shape 1,16,64 --output telemetry.json
Load telemetry + extract an FX graph IR via agnitra.core.ir.graph_extractor.
Explore results inside agnitra_enhanced_demo.ipynb (Colab badge included). The notebook now includes an Agentic Optimization API (v1.0) section that exercises run_agentic_optimization end-to-end and previews the patch plan + Triton kernel produced by the server.

Development

pytest -q

Artifacts generated by tests (profiles, telemetry) live under benchmarks/ and agnitraai/context/; consult .gitignore for the latest ignore rules. Update docs when the CLI or SDK experience changes.

Publishing

Follow docs/publishing.md for the PyPI and npm release checklists, including version bumps, build steps, and publish commands.

Resources

docs/responses_api.md – OpenAI Responses API spec followed by the SDK.
docs/docs-deployment.md – Mintlify documentation structure and deployment guide.
internal-docs/prd.md – Business context and long-term roadmap (internal).
internal-docs/ui_ux_handoff.md – UX flows and SaaS onboarding notes.
internal-docs/non_interactive_codex_usage.txt – Headless Codex automation notes.
AGENTS.md + notes.yaml – roadmap fragments and agent experiments.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

May 6, 2026

0.2.3

May 6, 2026

0.2.2

May 6, 2026

0.2.1

May 6, 2026

0.2.0

May 6, 2026

This version

0.1.0

Oct 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agnitra-0.1.0.tar.gz (155.6 kB view details)

Uploaded Oct 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agnitra-0.1.0-py3-none-any.whl (154.9 kB view details)

Uploaded Oct 23, 2025 Python 3

File details

Details for the file agnitra-0.1.0.tar.gz.

File metadata

Download URL: agnitra-0.1.0.tar.gz
Upload date: Oct 23, 2025
Size: 155.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for agnitra-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`810c3b30e006c466c985d3dae9506d1de505e3f6fb1f54ad22a6c95fb7722300`
MD5	`1f3434fde8abca2da20ca6f0276efc35`
BLAKE2b-256	`d0165ea2b9bc7d4145846788911bbc7c1f62529bba342bd191e15a690591d5fe`

See more details on using hashes here.

File details

Details for the file agnitra-0.1.0-py3-none-any.whl.

File metadata

Download URL: agnitra-0.1.0-py3-none-any.whl
Upload date: Oct 23, 2025
Size: 154.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.11

File hashes

Hashes for agnitra-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`652598a4a261e9a2c9cf6ce5fb55b4a2d9c8e2f9b1e1d22c49706fa8a9b7c74e`
MD5	`ccd401f126f02420804a6128bb64f6eb`
BLAKE2b-256	`fdee9684f9ce7c9e56c295acd378e1badb26fee558a73c1b678e5fb72d0f795b`

See more details on using hashes here.

agnitra 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agnitraai

Highlights

Installation

Python (PyPI)

From wheel (recommended)

From source

JavaScript / TypeScript (npm)

Quick Start

1. Watch the walkthrough

2. Run the milestone script

3. CLI cheatsheet

3.5 Agentic Optimization API

4. Marketplace Usage Endpoint

4. SDK in your code

Usage-Based SaaS Architecture

Repository Layout (Monorepo blueprint)

Metering Flow

Deployment & Marketplace Integration

Profiling & Visualisation

Development

Publishing

Resources

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes