Adaptive Utility Agents โ a Django-like framework for adaptive multi-model LLM systems.
Project description
AUA Framework
A production framework for self-correcting, multi-specialist LLM systems.
Full site: https://praneethtota.github.io/Adaptive-Utility-Agent
Recommended way to explore the project:** Start with the overview, then read the whitepaper, roadmap, and tutorial for implementation details.
What it does
AUA sits between your application and your language models. It routes prompts to specialist models, scores responses with a utility function, catches contradictions, injects prior verified corrections into future queries, and enforces policies in real-time.
The core idea: a model that makes a wrong answer on Tuesday should not make the same wrong answer on Thursday. AUA closes that loop without waiting for a new model release.
pip install adaptive-utility-agent
aua init my-project --preset coding --tier macbook
cd my-project && aua serve
Sister Project: AUA Veritas
I am currently building a standalone app based on this framework called AUA Veritas.
AUA Framework is intended for developers, MLEs, and AI infrastructure teams who want to build adaptive multi-model LLM systems with routing, utility scoring, arbitration, correction loops, observability, and deployment controls.
AUA Veritas applies those ideas in a consumer-facing desktop app. Instead of exposing framework internals, it gives everyday AI users a simple interface: ask a question, let Veritas compare multiple frontier models, remember prior corrections, and return one answer with a confidence signal.
The sister repo is here:
๐ AUA Veritas
Documentation
| Page | Audience | Link |
|---|---|---|
| Landing page | Everyone | whitepaper.html |
| Tutorial | ML engineers, builders | tutorial.html |
| Production architecture | DevOps, platform engineers | productionizing.html |
| Whitepaper (7 parts) | Researchers, theorists | whitepaper_overview.html |
| Roadmap & validation | Everyone | aua_roadmap.html |
| AI Data Centers | Inference infra, GPU cloud | domain_ai_datacenters.html |
| Self-Driving Vehicles | AV engineers | domain_self_driving.html |
| Autonomous Systems | Robotics, safety engineering | domain_autonomous_systems.html |
| Software Engineering | Coding agents, dev-tools | domain_software_engineering.html |
| Dynamic Pricing | Pricing platforms | domain_dynamic_pricing.html |
| Energy Systems | Grid software, DER | domain_energy_systems.html |
| Creative Systems | Generative media | domain_creative_systems.html |
| Recommendation Engines | RecSys, personalization | domain_recommendation_engines.html |
Quickstart
Install
pip install adaptive-utility-agent
# With GPU serving backend (Linux + CUDA)
pip install "adaptive-utility-agent[vllm]"
# With development tools
pip install "adaptive-utility-agent[dev]"
Scaffold and serve
# Mac / Apple Silicon โ uses Ollama (brew install ollama first)
aua init my-project --preset coding --tier macbook
cd my-project
aua doctor # pre-flight check
aua serve # start specialists + router on :8000
Send a query
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "Write binary search in Python. State time complexity."}'
from aua import Router
from aua.config import load_config
config = load_config("aua_config.yaml")
router = Router.from_config(config)
result = await router.query("Write bubble sort. What is its O complexity?")
print(result.response)
print(f"U={result.u_score:.3f} mode={result.routing_mode}")
Chat UI
# Terminal 1
aua serve --tier macbook
# Terminal 2
aua ui # starts on http://localhost:3001 (admin / aua-admin)
Hardware tiers
| Tier | Hardware | Backend | Notes |
|---|---|---|---|
macbook |
Apple M-series / Intel | Ollama | Install: brew install ollama |
single-4090 |
1ร RTX 4090 24 GB | vLLM AWQ | |
quad-4090 |
4ร RTX 4090 | vLLM AWQ | Dedicated GPU per specialist |
a100-cluster |
1ร A100 80 GB | vLLM fp16 | No quantization |
Aliases: rtx4090 โ single-4090, a100 โ a100-cluster.
What ships in v1.1
| Component | Detail |
|---|---|
| REST API | 50+ endpoints โ query, stream, batch, corrections (full CRUD + evidence), config, deploy, status, sessions, metrics, conversations/messages/projects, keyword search, context backups, analytics suite, update management, bug reports, local models, domain ontology |
| CLI | 22 command groups โ aua init, aua serve, aua doctor, aua status, aua eval, aua guard, aua policy, aua calibrate, aua logs, aua metrics, and more |
| Plugin interfaces | 8 Protocol interfaces โ FieldClassifier, UtilityScorer, ArbiterPolicy, PromotionPolicy, CorrectionStore, ModelBackend, StateStore, HookPlugin |
| Hooks | 11 lifecycle hook points โ pre_query, post_route, pre_specialist_call, post_specialist_call, pre_arbiter, post_arbiter, on_correction, pre_response, post_response, on_promotion, on_rollback |
| Middleware | AUAMiddleware โ before_query / after_response wraps every request |
| YAML extensions (v1.1) | plugins:, hooks:, middleware:, state:, security: config blocks โ strict validation, contract-checked imports, GET /extensions server truth |
| Persistence & search (v1.1) | Conversations, messages, projects; message-level keyword search with async indexing |
| Production ops (v1.1) | Context backups + coverage job, correction lifecycle (explicit/implicit/CRUD), analytics + reliability endpoints, crash + bug reporting, remote model config, dynamic domain ontology |
| Session IDs (v1.1) | session/trace/request IDs on every response, propagated end-to-end (#15) |
| Secrets (v1.1) | secrets: block โ env, Vault, AWS SM, GCP; live provider integration tests in CI |
| Assertions + Policy | @assertion decorator, AssertionLevel (BLOCKING/SOFT/INFO), Policy with YAML config, Option B E-bonus, gold-standard DPO session detection |
| Calibration | aua calibrate --layer 1/2/3 โ eval harness, routing weight analysis, DPO pair export |
| Prometheus metrics | 18 metrics including assertion fail rate, E-bonus histogram, retry counter |
| Observability | Structured JSON logs, Prometheus/Grafana, OpenTelemetry traces, ELK/Splunk-compatible |
| Chat UI | Next.js 14, three-panel layout: sidebar, chat, Framework Debugger |
| Blue-green deployment | Utility-deviation-triggered promotion, aua rollback |
| Test suite | 197 tests, Python 3.10 / 3.11 / 3.12 matrix |
The utility function
U = w_e(f) ยท E + w_c(f) ยท C + w_k(f) ยท K
E โ Efficacy: how well the response serves the domain objective [0, 1]
C โ Confidence: Kalman-filtered internal consistency [0, 1]
K โ Curiosity: UCB-style exploration bonus, capped at 50% of E+C
f โ field (software_engineering, mathematics, general, ...)
The additive weighted structure is not a convenience โ it is the unique functional form satisfying five behavioral axioms, proved via Debreu's representation theorem (Theorem B.1, Appendix B).
Policies โ teaching the framework what good looks like
from aua.guard import assertion, AssertionLevel
from aua.policy import Policy
@assertion(name="PythonSyntaxCheck", level=AssertionLevel.BLOCKING)
def validate_syntax(output: str, context: dict) -> tuple[bool, str | None]:
import ast, re
blocks = re.findall(r"```python(.*?)```", output, re.DOTALL)
for block in blocks:
try:
ast.parse(block)
except SyntaxError as e:
return False, f"Syntax error at line {e.lineno}"
return True, None
@assertion(name="AnalogyBonus", level=AssertionLevel.INFO, bonus=0.10)
def reward_analogy(output: str, context: dict) -> tuple[bool, str | None]:
if any(p in output.lower() for p in ["like a", "similar to", "imagine"]):
return True, "Positive: analogy used"
return True, None # neutral โ no bonus
policy = Policy(name="SafeCoding", max_total_bonus=0.30)
policy.add(validate_syntax) # BLOCKING โ retries on fail
policy.add(reward_analogy) # INFO โ boosts E score
YAML equivalent in policies/safe_coding.yaml:
name: SafeCoding
version: "1.0"
max_retries: 3
max_total_bonus: 0.30
assertions:
- import_path: mypackage.policies:validate_syntax
- import_path: mypackage.policies:reward_analogy
bonus: 0.10
utility_overrides:
w_k: 0.30
aua policy validate policies/safe_coding.yaml
aua policy apply policies/safe_coding.yaml
Over time: BLOCKING assertions reduce failures โ sessions that pass become gold-standard DPO data โ aua calibrate --layer 3 exports them โ fine-tune โ repeat.
Hooks
class SlackOnCorrection:
async def __call__(self, event: dict) -> dict:
# event["type"] == "on_correction"
# event keys: subject, domain, claim, confidence, decay_class, source
await notify_slack(f"New correction: {event['claim']}")
return event
hooks:
on_correction:
- import_path: plugins.hooks:SlackOnCorrection
fail_closed: false
timeout_s: 3.0
All 11 hook points, event dict schemas, and examples are in tutorial.html Part 14.
Project structure
aua/ # Core framework package
โโโ router.py # Request routing + REST endpoints
โโโ arbiter.py # Contradiction detection + 4-check arbitration
โโโ utility_scorer.py # U = w_eยทE + w_cยทC + w_kยทK
โโโ field_classifier.py # Probabilistic domain routing
โโโ assertions_store.py # Cross-session corrections with decay classes AโD
โโโ correction_loop.py # DPO pair accumulation
โโโ blue_green.py # Utility-deviation-triggered model promotion
โโโ rollback.py # Model rollback with event log
โโโ guard.py # @assertion decorator, AssertionLevel, Policy.run()
โโโ policy.py # Policy dataclass + YAML loader
โโโ hooks.py # HookRunner โ 11 lifecycle hook points
โโโ auth.py # 15-scope token auth + mTLS
โโโ metrics.py # 18 Prometheus metrics
โโโ otel.py # OpenTelemetry tracing
โโโ eval.py # Evaluation harness
โโโ chat.py # Chat session management
โโโ state.py # SQLite state store (sessions, corrections, assertion_events)
โโโ cli.py # aua CLI โ 22 command groups
โโโ config.py # AUAConfig + tier loader
โโโ plugins/
โโโ interfaces.py # 8 Protocol interfaces
โโโ registry.py # Plugin load + validation
apps/
โโโ aua_chat/ # Next.js 14 Chat UI (npm run dev or aua ui)
tests/ # 197 tests across Python 3.10 / 3.11 / 3.12
docs/
โโโ v1_validation_report.md # Full validation record
โโโ archive/ # v0.5 pages (preserved)
Validated results (v1.0 baseline, RTX 4090)
| Result | Value | Source |
|---|---|---|
| Repeated error reduction | 69.6% (14 vs 46 over 400 tasks) | agent/simulate_extended.py |
| Routing correctness gain (VCG arbitration) | +43.3pp vs no routing (p = 0.0003, d = 1.02) | agent/routing_experiment.py |
| Mismatched routing harm | โ17.5% correctness, Brier 0.292 vs 0.160 | Same |
| U โ correctness correlation | Pearson r = 0.461, p < 10โปโดโฐ | Extended simulation |
| Brier calibration improvement | 14.3% overall, 29.5% by cycle 5 | Extended simulation |
| Contradiction rate reduction | 22% โ 6% over 10 cycles (73%) | Extended simulation |
Full validation record with all 197 test names, CLI reference, Docker Compose, Chat UI, security, and observability validation: docs/v1_validation_report.md.
Roadmap
| Item | Status |
|---|---|
| Per-user correction scoping (multi-tenant isolation) | v1.1 |
| Full chosen+rejected DPO pair generation (auto populated) | v1.1 |
| Physical hardware comparison: 7B specialist graph vs 70B monolithic | Empirical priority |
| Safety-critical deployment validation (shadow-mode, abstention testing) | Planned |
| Regex + LLM-judge eval check types | v1.1 |
| Policy version history and rollback | v1.1 |
| Automatic fine-tuning pipeline (Axolotl/TRL integration) | v2.0 |
The v1.1 roadmap is also tracked in aua_roadmap.html.
The core mechanism: utility as a control law
The utility function governs behavior at every timescale:
- At query time: routes to the right specialist, scores the response, enforces assertions, injects prior corrections
- Session-by-session: specialists that consistently fail assertions don't get promoted via blue-green
- Calibration cycles:
aua calibrate --layer 3exports gold-standard sessions (all INFO assertions fired, no BLOCKING failed) as DPO training pairs
The additive weighted structure is not a convenience โ it is the unique functional form satisfying five behavioral axioms (monotonicity, continuity, separability, field invariance, linear scaling invariance), proved from first principles via Debreu's representation theorem and the Cauchy functional equation (Theorem B.1).
License
Code: GNU General Public License v3.0 โ see LICENSE
Whitepaper: Creative Commons Attribution 4.0 โ see LICENSE-CC-BY-4.0
If you build on this work, please cite:
Tota, P. (2026). AUA Framework v1.0: A Production Framework for Self-Correcting Multi-Specialist AI Systems. GitHub. https://github.com/praneethtota/Adaptive-Utility-Agent
๐ Full documentation, tutorial, and domain deep-dives:
https://praneethtota.github.io/Adaptive-Utility-Agent
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adaptive_utility_agent-1.1.0.tar.gz.
File metadata
- Download URL: adaptive_utility_agent-1.1.0.tar.gz
- Upload date:
- Size: 214.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6848c5a5632c223efecc79b21be75fdf6646f24cdcc14a15160153e1c3db4ae
|
|
| MD5 |
6275bd9dfbf08932465fb4ae6ca5473d
|
|
| BLAKE2b-256 |
dca030b7d7272eccc02c51ef99daf90d61d2c0838919c73039693d70a8e1b2ef
|
Provenance
The following attestation bundles were made for adaptive_utility_agent-1.1.0.tar.gz:
Publisher:
release.yml on praneethtota/Adaptive-Utility-Agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adaptive_utility_agent-1.1.0.tar.gz -
Subject digest:
e6848c5a5632c223efecc79b21be75fdf6646f24cdcc14a15160153e1c3db4ae - Sigstore transparency entry: 1791308432
- Sigstore integration time:
-
Permalink:
praneethtota/Adaptive-Utility-Agent@0fb74537985d002aac9efcea00b7b16b94fe3997 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/praneethtota
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0fb74537985d002aac9efcea00b7b16b94fe3997 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file adaptive_utility_agent-1.1.0-py3-none-any.whl.
File metadata
- Download URL: adaptive_utility_agent-1.1.0-py3-none-any.whl
- Upload date:
- Size: 250.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca43e39bf190f0ccbcb585839b1f4c2be4f18347bf8b26d244e93b5aaf9f6b49
|
|
| MD5 |
1e3ff2e8b860706b4ea725da1df77a10
|
|
| BLAKE2b-256 |
b69b3ff62b60f19b5fcd24255c49765626fff87e6d4ad9be8702ef9d6aadc00f
|
Provenance
The following attestation bundles were made for adaptive_utility_agent-1.1.0-py3-none-any.whl:
Publisher:
release.yml on praneethtota/Adaptive-Utility-Agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
adaptive_utility_agent-1.1.0-py3-none-any.whl -
Subject digest:
ca43e39bf190f0ccbcb585839b1f4c2be4f18347bf8b26d244e93b5aaf9f6b49 - Sigstore transparency entry: 1791308518
- Sigstore integration time:
-
Permalink:
praneethtota/Adaptive-Utility-Agent@0fb74537985d002aac9efcea00b7b16b94fe3997 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/praneethtota
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0fb74537985d002aac9efcea00b7b16b94fe3997 -
Trigger Event:
workflow_dispatch
-
Statement type: