Score AI agent architectures against the AWAF open specification

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

awaf

These details have not been verified by PyPI

Project links

Homepage

Project description

awaf-cli

The reference implementation of the AWAF open specification. Catch agent architecture regressions before they ship.

Scores across 10 architectural pillars defined by the AWAF open specification. Designed to run periodically -- nightly, weekly, or on-demand before releases -- not on every commit. Each run makes 10 LLM calls; run it when architecture decisions change, not when typos are fixed.

No dashboards that need a legend. No compliance jargon. One number per pillar, one finding per issue, one fix per finding.

Install

pip install awaf

Requires Python 3.11+. Bring your own model and API key.

Provider Support

awaf-cli is model-agnostic. Use any supported LLM provider — no vendor lock-in.

Provider	Models	Key Env Var
`anthropic`	claude-haiku-4-5-20251001 (default), claude-sonnet-4-20250514, claude-opus-4-5	`ANTHROPIC_API_KEY`
`openai`	gpt-4o, gpt-4o-mini, o3, o4-mini	`OPENAI_API_KEY`
`azure`	Any Azure OpenAI deployment	`AZURE_OPENAI_API_KEY`
`google`	gemini-2.0-flash, gemini-1.5-pro	`GOOGLE_API_KEY`
`litellm`	Any LiteLLM-compatible model	Provider-specific

Default provider: anthropic with claude-haiku-4-5-20251001. Scores are calibrated on Claude; other providers may yield slight variance.

API Keys from .env

awaf automatically loads a .env file in the current directory at startup. Keys already set in the environment take precedence.

Create a .env file next to your project:

ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# GOOGLE_API_KEY=...

Then run normally — no export needed:

awaf run
awaf run --pillar foundation

If you prefer to load .env manually before running:

# bash / zsh
export $(grep -v '^#' .env | xargs) && awaf run

# PowerShell
Get-Content .env | ForEach-Object { $k,$v = $_ -split '=',2; [System.Environment]::SetEnvironmentVariable($k,$v) }; awaf run

Quickstart

# Default: Anthropic (.env or export)
export ANTHROPIC_API_KEY=sk-ant-...
awaf run

# OpenAI
export OPENAI_API_KEY=sk-...
awaf run --provider openai --model gpt-4o

# Azure / GitHub Copilot
export AZURE_OPENAI_API_KEY=...
awaf run --provider azure --model gpt-4o --azure-endpoint https://your-resource.openai.azure.com --azure-deployment gpt-4o

# LiteLLM (Bedrock, Groq, Ollama, etc.)
awaf run --provider litellm --model bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0

   _      _  _  _    _      ___
  /_\    | || || |  /_\    | __|
 / _ \   | \/ \/ | / _ \   | _|
/_/ \_\   \_/\_/  /_/ \_\  |_       Agent Well-Architected Framework

AWAF Assessment: my-agent
AWAF v1.0  |  2026-03-15  |  openai / gpt-4o
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Overall Score    78/100   Near Ready
  Close to production. Address findings before deploying.

  Scale: Production Ready >=90 · Near Ready >=75 · Needs Work >=50
         High Risk >=25 · Not Ready <25
  Foundation <40 = automatic FAIL regardless of overall score.
  Tier 2 pillars (Reasoning, Controllability, Context Integrity) carry 1.5x weight.

┌──────────────────────┬───────┬──────────────┬────────────┬─────────┐
│ Pillar               │ Score │ Progress     │ Confidence │  Status │
╞══════════════════════╪═══════╪══════════════╪════════════╪═════════╡
│ TIER 0 -- FOUNDATION                                               │
├──────────────────────┼───────┼──────────────┼────────────┼─────────┤
│ Foundation           │    85 │ [########  ] │ verified   │    PASS │
╞══════════════════════╪═══════╪══════════════╪════════════╪═════════╡
│ TIER 1 -- CLOUD WAF ADAPTED                                        │
├──────────────────────┼───────┼──────────────┼────────────┼─────────┤
│ Op. Excellence       │    74 │ [#######   ] │ verified   │         │
│ Security             │    82 │ [########  ] │ verified   │         │
│ Reliability          │    71 │ [#######   ] │ verified   │         │
│ Performance          │    80 │ [########  ] │ verified   │         │
│ Cost Optim.          │    65 │ [######    ] │ partial    │         │
│ Sustainability       │    79 │ [########  ] │ verified   │         │
╞══════════════════════╪═══════╪══════════════╪════════════╪═════════╡
│ TIER 2 -- AGENT-NATIVE  (1.5x weight)                              │
├──────────────────────┼───────┼──────────────┼────────────┼─────────┤
│ Reasoning Integ.     │    71 │ [#######   ] │ partial    │    1.5x │
│ Controllability      │    78 │ [########  ] │ verified   │    1.5x │
│ Context Integrity    │    80 │ [########  ] │ verified   │    1.5x │
└──────────────────────┴───────┴──────────────┴────────────┴─────────┘

  FILES ANALYZED     12 files
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  FINDINGS  (ordered by severity)
  [High     ]  Cost Optim.         No session budget cap; runaway token spend possible
  [Medium   ]  Reasoning Integ.    Evals present but hallucination rate not measured
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  RECOMMENDATIONS
  Cost Optim.         Add AWAF_SESSION_BUDGET_USD env var and wire hard stop in
                      agent loop before tool dispatch
  Reasoning Integ.    Instrument LangSmith eval run to capture hallucination rate
                      alongside tool selection accuracy
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

  TO IMPROVE THIS ASSESSMENT
  Share LangSmith or Braintrust eval output to upgrade Reasoning Integ.
  from partial to verified
  Share token usage dashboard or budget alert config to verify Cost Optim.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Per-Project Config

# awaf.toml
[project]
name = "my-agent"

[provider]
name = "openai"              # anthropic | openai | azure | google | litellm
model = "gpt-4o"
api_key_env = "OPENAI_API_KEY"   # defaults to provider standard env var

# Azure / Copilot specific
# name = "azure"
# model = "gpt-4o"
# api_key_env = "AZURE_OPENAI_API_KEY"
# azure_endpoint = "https://your-resource.openai.azure.com"
# azure_deployment = "gpt-4o"
# azure_api_version = "2025-01-01-preview"

# LiteLLM — any model string LiteLLM supports
# name = "litellm"
# model = "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0"

[thresholds]
overall_fail = 60
tier2_fail = 50
regression_limit = 10
warn_only = false

[files]
agent_patterns = ["agents/**/*.py", "tools/**/*.py", "pipelines/**"]
exclude = ["tests/**", "docs/**"]

[ci]
enabled = true
schedule = "0 9 * * 1"        # cron (UTC): only run when this schedule fires
change_detection = true        # skip if no files changed under watch_paths
watch_paths = [
    "src/agents",
    "src/signals",
]

[reporting]
post_pr_comment = true
terminal_format = "compact"    # compact | full | json

CI Config Fields

Field	Default	Description
`ci.enabled`	`true`	Set `false` to disable all CI-mode checks
`ci.schedule`	(none)	Cron expression (UTC). `awaf run --ci` skips if current time is outside ±5 min of a scheduled fire
`ci.change_detection`	`false`	Skip when no files under `watch_paths` changed
`ci.watch_paths`	`[]`	Directory prefixes to watch. Falls back to `[files].agent_patterns` when not set

CI Integration

GitHub Actions

name: AWAF Assessment
# Recommended: run on a schedule, not on every commit.
# Each run makes 10 LLM calls. Architecture changes slowly;
# nightly or weekly is usually the right cadence.
on:
  schedule:
    - cron: '0 6 * * 1'   # every Monday at 06:00 UTC
  workflow_dispatch:         # on-demand: run before releases or after major changes

jobs:
  awaf:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: YogirajA/awaf-action@v1
        with:
          # Use whichever provider key you have
          anthropic-api-key: ${{ secrets.ANTHROPIC_API_KEY }}
          # openai-api-key: ${{ secrets.OPENAI_API_KEY }}
          # azure-openai-api-key: ${{ secrets.AZURE_OPENAI_API_KEY }}
          provider: anthropic           # anthropic | openai | azure | google | litellm
          model: claude-haiku-4-5-20251001  # optional; omit to use provider default (Haiku)
          project-name: my-agent
          fail-threshold: 60
          tier2-fail-threshold: 50
          score-regression-limit: 10
          post-pr-comment: true

Recommended cadence: weekly schedule or on-demand before releases. Running on every PR makes sense only for teams actively refactoring agent architecture. For most teams, weekly is sufficient -- architecture changes slowly.

If you do run on PRs, use on: pull_request with a paths: filter so only agent-relevant changes trigger it. AWAF also exits 3 automatically when no agent files changed (configurable via agent_patterns in awaf.toml).

GitLab CI

include:
  - remote: 'https://raw.githubusercontent.com/YogirajA/awaf-cli/main/integrations/gitlab/awaf-gitlab-ci.yml'

awaf:
  variables:
    ANTHROPIC_API_KEY: $ANTHROPIC_API_KEY
    AWAF_PROVIDER: anthropic
    AWAF_PROJECT_NAME: my-agent
    AWAF_FAIL_THRESHOLD: "60"

Exit Codes

Code	Meaning
0	Passed all thresholds
1	Score below threshold or regression exceeded
2	Assessment failed (API error, ingest error)
3	No agent files changed, skipped

CLI Reference

awaf run                                         # assess current directory
awaf run --paths agents/ tools/                  # specific paths
awaf run --ci                                    # CI mode with git context
awaf run --pillar foundation                     # single pillar only
awaf run --provider openai --model gpt-4o        # override provider
awaf run --provider litellm --model ollama/llama3 # local model via LiteLLM
awaf run --parallel                              # concurrent mode (faster, higher cost)
awaf run --delay 10                              # sequential with 10s pause between pillars
awaf run --model claude-opus-4-5                 # override model (default: claude-haiku-4-5-20251001)
awaf history                                     # score history for current project
awaf compare <id1> <id2>                         # diff two assessments
awaf report --format json                        # JSON output for CI artifact upload
awaf report --coverage                           # show files analyzed and skipped
awaf providers                                   # list configured providers and status

Progress is printed as each pillar starts (▸ Evaluating Foundation...). No color codes when stdout is not a TTY. No spinners in CI mode.

Running pillars one at a time

Useful on free-tier API plans or when debugging a specific pillar. Each run saves to awaf.db and contributes to score history.

awaf run --pillar foundation
awaf run --pillar security
awaf run --pillar controllability
# ... pick the pillars you care about

To add a pause between sequential pillar calls (useful on rate-limited API plans):

awaf run --delay 15

What Gets Scored

awaf-cli implements AWAF v1.0 across 10 pillars in 3 tiers. Full pillar definitions and scoring questions are in the specification repo.

Tier 0: Foundation. Can this agent run independently?

Tier 1: Cloud WAF Adapted (1.0x weight). Operational Excellence, Security, Reliability, Performance Efficiency, Cost Optimization, Sustainability

Tier 2: Agent-Native (1.5x weight). Reasoning Integrity, Controllability, Context Integrity

The agent-native pillars are what make AWAF distinct. Cloud infrastructure has no equivalent for them; they exist because agents are not servers. See aradhye.com for the original thinking behind this.

What It Analyzes

awaf-cli reads what is in your repository: Python, TypeScript, Go, YAML, JSON, TOML, Markdown, and PDF files.

It can verify: trust tier enforcement in code, kill switch and cancel implementations, loop detection and budget guards, eval framework presence, sanitization at input boundaries, slice boundary documentation.

It cannot verify (flagged as partial confidence): cloud resource configs not in the repo, whether SLOs are being met in production, runtime hallucination rates, whether circuit breakers are actually firing.

When something cannot be verified, the output says so explicitly. Partial confidence with clear coverage gaps is more useful than a confident score built on assumptions.

Score History

Every assessment is stored locally in awaf.db. Score history is tracked per project, per branch, per commit, and per provider/model.

awaf history

my-agent  last 5 assessments
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  2026-02-27  a3f9c12  PR #47   72  -6   openai/gpt-4o       Controllability regression
  2026-02-24  8bc1a33  main     78  +3   anthropic/claude-opus-4-5  Context Integrity improved
  2026-02-21  4de92f1  main     75  +0   anthropic/claude-opus-4-5
  2026-02-18  2ab77c4  main     75  +8   openai/gpt-4o       Security and Reliability up
  2026-02-12  9ff3e21  main     67  —    anthropic/claude-opus-4-5

Six months of CI runs become your architectural changelog.

How It Works

awaf-cli sends your architecture artifacts to the LLM provider of your choice. Each of the 10 AWAF pillars is evaluated by a separate model call running sequentially by default (enables prompt cache sharing for ~90% cost reduction on Anthropic). Use --parallel for concurrent execution. Results are written to a local SQLite database. No central coordinator. No shared state between pillar evaluations.

Artifacts → Ingestor → Event Bus → [10 Pillar Agents sequentially] → SQLite → Terminal
                                          ↑
                              Provider Abstraction Layer
                         (Anthropic | OpenAI | Azure | Google | LiteLLM)

The tool is built to be AWAF-compliant itself: choreography over orchestration, vertical slice per pillar, blast radius bounded. See ARCHITECTURE.md.

Environment Variables

# Provider selection (can also be set in awaf.toml)
AWAF_PROVIDER=anthropic          # anthropic | openai | azure | google | litellm
AWAF_MODEL=claude-haiku-4-5-20251001  # optional model override

# API keys — use whichever provider you're running
ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-...
AZURE_OPENAI_API_KEY=...
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_DEPLOYMENT=gpt-4o
GOOGLE_API_KEY=...

# Session controls
AWAF_DB_URL=sqlite:///./awaf.db
AWAF_MAX_ARTIFACTS_TOKENS=40000
AWAF_SESSION_BUDGET_USD=1.00     # approximate; pricing varies by provider
AWAF_CONCURRENCY=1               # pillar workers (default 1 = sequential/economical; set higher for --parallel override)
AWAF_LOG_LEVEL=INFO

# Anthropic: prompt caching is enabled automatically on system + user prompts.
# Cached tokens do not count against the input TPM rate limit, reducing pressure
# on Tier 1 plans (50K TPM for Haiku, 30K TPM for Sonnet/Opus).
# Cache TTL is ~5 minutes; repeated runs within that window benefit most.

Deployment Modes

Mode	Setup	Data	Right For
Local	pip install awaf + API key	awaf.db on your machine	Solo developers, OSS projects
Cloud	API key + AWAF_MODE=cloud	awaf.dev (coming)	Teams, dashboards, benchmarks
On-Prem	Docker Compose / Helm	Your PostgreSQL	Enterprise, regulated industries

The local mode is fully functional with no account required. Cloud and on-prem add team dashboards, cross-project score history, and industry benchmarks. On-prem: no artifacts leave your network. All model API calls use your own API key. No telemetry unless opted in.

Score Badge

[![AWAF Score](https://img.shields.io/badge/AWAF%20Score-78%20Near%20Ready-2563EB?style=flat-square)](https://github.com/YogirajA/AWAF)

Live badge (cloud mode):

[![AWAF Score](https://awaf.dev/badge/your-project)](https://awaf.dev/your-project)

Troubleshooting

Pillars score 0 / "unparseable JSON" warning

Symptom

Pillar 'Foundation' returned unparseable JSON: Expecting ',' delimiter: line 111 column 6 (char 16195)

The pillar gets a score of 0 and confidence: self_reported instead of a real evaluation.

Cause

Some models — particularly smaller or quantized variants — produce JSON that violates the spec when the response is long (code snippets, multi-line findings). awaf-cli attempts automatic repair via json-repair, but repair can fail when the output is severely malformed.

Workaround: upgrade to a more capable model

# Anthropic — Sonnet or Opus handles long structured output reliably
awaf run --model claude-sonnet-4-5
awaf run --model claude-opus-4-5

# OpenAI
awaf run --provider openai --model gpt-4o

# Local via LiteLLM — try a larger quant
awaf run --provider litellm --model ollama/llama3:70b

Or set it permanently in awaf.toml:

[provider]
model = "claude-sonnet-4-5"

The default model (claude-haiku-4-5-20251001) is fast and cheap but occasionally produces invalid JSON on codebases with large artifact payloads. If you see this warning on more than one pillar per run, switching to Sonnet will resolve it.

Isolate the failing pillar

The default sequential mode already prints each pillar as it completes. Add a delay to slow things down further:

awaf run --delay 5

Contributing

Bug reports, feature requests, and PRs welcome. Provider adapter contributions especially welcome — see PROVIDER_SPEC.md for the interface contract.

For changes to the AWAF specification itself (pillar definitions, scoring questions, methodology), open an issue in the AWAF specification repo. This repo is for the implementation.

License

Apache 2.0. See LICENSE.

YogirajA/AWAF: The AWAF open specification
PROVIDER_SPEC.md: Provider abstraction layer spec — build your own adapter
Are We Building AI Agents Like We Built Microservices?: The post that introduced AWAF

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

awaf

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.5.1

Mar 29, 2026

0.5.0

Mar 29, 2026

0.4.1

Mar 26, 2026

0.4.0

Mar 26, 2026

0.3.3

Mar 20, 2026

0.3.2

Mar 20, 2026

0.3.1

Mar 20, 2026

0.3.0

Mar 18, 2026

0.2.0

Mar 17, 2026

This version

0.1.9

Mar 17, 2026

0.1.8

Mar 16, 2026

0.1.7

Mar 16, 2026

0.1.6

Mar 16, 2026

0.1.5

Mar 16, 2026

0.1.4

Mar 16, 2026

0.1.3

Mar 16, 2026

0.1.2

Mar 12, 2026

0.1.1

Mar 12, 2026

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awaf-0.1.9.tar.gz (240.2 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

awaf-0.1.9-py3-none-any.whl (58.9 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file awaf-0.1.9.tar.gz.

File metadata

Download URL: awaf-0.1.9.tar.gz
Upload date: Mar 17, 2026
Size: 240.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for awaf-0.1.9.tar.gz
Algorithm	Hash digest
SHA256	`7ee07f79792c0bf91e0b28505215080d78f6b888ae405041a93bf0fa933dee9d`
MD5	`92ef7590cee7cbf49ce3183f9648ed79`
BLAKE2b-256	`4aa91e04ea7e9f50608f630577b8cea268e645af03849a0e8b62e64ee701ad69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for awaf-0.1.9.tar.gz:

Publisher: publish.yml on YogirajA/awaf-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: awaf-0.1.9.tar.gz
- Subject digest: 7ee07f79792c0bf91e0b28505215080d78f6b888ae405041a93bf0fa933dee9d
- Sigstore transparency entry: 1113312185
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: YogirajA/awaf-cli@07f030a60a1d714ba57ba9f960203c866c877803
- Branch / Tag: refs/tags/v0.1.9
- Owner: https://github.com/YogirajA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@07f030a60a1d714ba57ba9f960203c866c877803
- Trigger Event: push

File details

Details for the file awaf-0.1.9-py3-none-any.whl.

File metadata

Download URL: awaf-0.1.9-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 58.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for awaf-0.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff85d5f50ca07797cdc3937abc56b602d907cec06f5872ee972a5bb3f88e5b18`
MD5	`fcc3c6030b876d64c9cc10646de8d734`
BLAKE2b-256	`d088c76f222836d17486e9c791ce6a9f89553fb5ebe0d8b3bfee66884cb10366`

See more details on using hashes here.

Provenance

The following attestation bundles were made for awaf-0.1.9-py3-none-any.whl:

Publisher: publish.yml on YogirajA/awaf-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: awaf-0.1.9-py3-none-any.whl
- Subject digest: ff85d5f50ca07797cdc3937abc56b602d907cec06f5872ee972a5bb3f88e5b18
- Sigstore transparency entry: 1113312221
- Sigstore integration time: Mar 17, 2026
Source repository:
- Permalink: YogirajA/awaf-cli@07f030a60a1d714ba57ba9f960203c866c877803
- Branch / Tag: refs/tags/v0.1.9
- Owner: https://github.com/YogirajA
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@07f030a60a1d714ba57ba9f960203c866c877803
- Trigger Event: push

awaf 0.1.9

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

awaf-cli

Install

Provider Support

API Keys from .env

Quickstart

Per-Project Config

CI Config Fields

CI Integration

GitHub Actions

GitLab CI

Exit Codes

CLI Reference

Running pillars one at a time

What Gets Scored

What It Analyzes

Score History

How It Works

Environment Variables

Deployment Modes

Score Badge

Troubleshooting

Pillars score 0 / "unparseable JSON" warning

Contributing

License

Related

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance