A quality-assurance engine for LLM-generated code

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

AgentGuard

A quality-assurance engine for LLM-generated code. Python engine + HTTP protocol + MCP server + thin SDKs for any language.

What It Does

AgentGuard sits between your AI coding agent and the LLM, ensuring that every piece of generated code is:

Structurally sound — Parses, lints, type-checks before any human sees it
Properly scoped — Project archetypes prevent over/under-engineering
Built top-down — Skeleton → contracts → wiring → logic (general to particular)
Self-verified — The LLM reviews its own output against explicit criteria
Cost-tracked — Every token, every dollar, every model comparison — visible

Installation

Requires Python 3.11+.

# Core library (Anthropic + OpenAI providers included)
pip install agentguard

# With HTTP server (FastAPI + Uvicorn)
pip install "agentguard[server]"

# With MCP server (for Claude Desktop, Cursor, Windsurf, Cline)
pip install "agentguard[mcp]"

# With all optional providers and transports
pip install "agentguard[all]"

Optional LLM providers

pip install "agentguard[litellm]"    # LiteLLM router (Ollama, Together, etc.)
pip install "agentguard[google]"     # Google Gemini

Verify installation

agentguard --version
agentguard list          # Show available archetypes
agentguard info api_backend   # Show archetype details

How It Works

AgentGuard uses a top-down generation pipeline that builds code from architecture to implementation, not the other way around:

L1 Skeleton      →  What files exist and what each one does
L2 Contracts     →  Typed function/class stubs (signatures, no bodies)
L3 Wiring        →  Import statements and call-chain connections
L4 Logic         →  Actual function implementations
   Validate      →  Syntax, lint, types, imports — mechanical checks
   Challenge     →  LLM self-reviews against 30+ criteria per archetype

Each level constrains the next. The LLM can't hallucinate imports at L4 because L3 already defined them. It can't invent APIs because L2 already declared the signatures. This is why MCP-generated code has better architecture — it was designed before it was implemented.

Archetypes

An archetype is a project blueprint that configures the entire pipeline. It defines:

Tech stack — language, framework, test runner, linter
Expected file structure — what files should exist and where
Validation rules — what checks to run (syntax, lint, types, imports)
Challenge criteria — what the self-review evaluates (30+ criteria for react_spa)
Maturity level — starter (minimal) or production (full infrastructure)
Infrastructure files — mandatory files the pipeline must generate (ErrorBoundary, logger, constants, etc.)

Archetype	Use When	Language	Maturity
`script`	One-off automation, data processing	Python	starter
`cli_tool`	CLI with subcommands, flags, help text	Python	starter
`api_backend`	REST API with routes, models, auth	Python	production
`web_app`	Full-stack app (React + API)	Python + TS	production
`library`	Reusable package with public API	Python	production
`react_spa`	Client-side SPA with routing, state, i18n	TypeScript	production

Pick the archetype that matches your project. Production archetypes generate more infrastructure (error boundaries, logging, code-splitting, constants) — this is intentional.

# See what an archetype expects
agentguard info react_spa

Usage

There are four ways to use AgentGuard, depending on your setup:

1. CLI — Generate from the command line

The simplest way. No code needed.

# Generate a project from a spec
agentguard generate "A user auth API with JWT tokens, registration, and login" \
  --archetype api_backend \
  --model anthropic/claude-sonnet-4-20250514 \
  --output ./my-api

# Validate existing code files
agentguard validate src/main.py src/models.py --archetype api_backend

# Self-challenge a file against quality criteria
agentguard challenge src/main.py --criteria "No hardcoded secrets" --criteria "Error handling on all I/O"

CLI Commands Reference

Command	What It Does
`agentguard generate SPEC`	Generate a full project from a natural-language spec
`agentguard validate FILES...`	Run structural checks on code files
`agentguard challenge FILE`	Self-challenge a file against quality criteria
`agentguard serve`	Start the HTTP API server (default port 8420)
`agentguard mcp-serve`	Start the MCP server (stdio or SSE transport)
`agentguard list`	List available archetypes
`agentguard info ARCHETYPE`	Show archetype details (tech stack, structure, rules)
`agentguard trace TRACE_FILE`	Display a trace file summary

All commands support --help for full option details. Use -v for debug logging.

2. Python Library — Direct import

For building custom agents or integrating into existing Python workflows.

from agentguard import Pipeline, Archetype

# Load an archetype and create a pipeline
arch = Archetype.load("api_backend")
pipe = Pipeline(archetype=arch, llm="anthropic/claude-sonnet-4-20250514")

# Generate code (returns files, trace, and cost)
result = await pipe.generate(
    spec="A user authentication API with JWT tokens, registration, and login",
)

# Write files to disk
for file_path, content in result.files.items():
    Path(file_path).write_text(content)

# Inspect what happened
print(result.trace.summary())
# → 12 LLM calls | $0.34 total | 3 structural fixes | 1 self-challenge rework

Using individual modules

You don't have to use the full pipeline. Each module works standalone:

from agentguard.validation.validator import Validator
from agentguard.challenge.challenger import SelfChallenger
from agentguard.archetypes.base import Archetype

# Validate code without generating it
validator = Validator(archetype=Archetype.load("api_backend"))
report = validator.check({"main.py": code_string})
print(report.passed)  # True/False

# Challenge code against custom criteria
challenger = SelfChallenger(llm=create_llm_provider("anthropic/claude-sonnet-4-20250514"))
result = await challenger.challenge(
    output=code_string,
    criteria=["No SQL injection", "All endpoints authenticated"],
)

Supported LLMs

Pipeline(llm="anthropic/claude-sonnet-4-20250514")   # Anthropic (built-in)
Pipeline(llm="openai/gpt-4o")                # OpenAI (built-in)
Pipeline(llm="google/gemini-2.0-flash")       # Google (pip install "agentguard[google]")
Pipeline(llm="litellm/ollama/llama3")         # Any LiteLLM model (pip install "agentguard[litellm]")

3. HTTP Server — For non-Python agents

Run AgentGuard as a service and call it from TypeScript, Go, Rust, or any language with HTTP.

# Start the server
agentguard serve --host 0.0.0.0 --port 8420

# Optional: require an API key
agentguard serve --api-key "my-secret-key"

# Optional: save traces to disk
agentguard serve --trace-store ./traces

Then call from any language:

// TypeScript SDK (thin wrapper over HTTP)
import { AgentGuard } from "@agentguard/sdk";

const ag = new AgentGuard({ url: "http://localhost:8420" });
const result = await ag.generate({
  spec: "A user auth API with JWT tokens",
  archetype: "api_backend",
  llm: "anthropic/claude-sonnet-4-20250514",
});

# Or raw HTTP from any language
curl -X POST http://localhost:8420/generate \
  -H "Content-Type: application/json" \
  -d '{"spec": "A user auth API", "archetype": "api_backend"}'

4. MCP Server — For AI coding tools (recommended)

This is the most powerful integration. Your AI tool (Claude Desktop, Cursor, Windsurf, Cline) gains access to AgentGuard's tools directly. The LLM itself uses the tools during generation — no human in the loop.

Step 1: Install with MCP support

pip install "agentguard[mcp]"

Step 2: Add to your AI tool's config

// Claude Desktop: ~/.claude/claude_desktop_config.json
// Cursor:         .cursor/mcp.json
// Windsurf:       ~/.codeium/windsurf/mcp_config.json
// Cline:          .vscode/cline_mcp_settings.json
{
  "mcpServers": {
    "agentguard": {
      "command": "agentguard",
      "args": ["mcp-serve"]
    }
  }
}

Step 3: Ask your AI tool to build something

The LLM will automatically discover and use AgentGuard's tools. A typical generation flow looks like:

You:  "Build a whitelabel ecommerce SPA with i18n, seller onboarding,
       promo engine, and checkout"

LLM calls: skeleton(spec=..., archetype="react_spa")
       →  Returns file tree with tiers and responsibilities

LLM calls: contracts_and_wiring(spec=..., skeleton_json=...)
       →  Returns typed stubs + import wiring for every file

LLM:   Generates all files following the stubs and wiring

LLM calls: get_challenge_criteria(archetype="react_spa")
       →  Returns 36 quality criteria to self-review against

LLM:   Reviews its own output, reports pass/fail per criterion

No API key is needed for the agent-native tools — the host LLM does all the generation, guided by AgentGuard's structured prompts. This is the key insight: AgentGuard doesn't replace the LLM, it gives the LLM a disciplined process to follow.

MCP Tools Reference

The MCP server exposes 13 tools in two categories:

Agent-native tools (no API key needed — the host LLM does the work):

Tool	Step	What It Returns
`skeleton`	L1	File tree with responsibilities, tiers (config/foundation/feature), and infrastructure file requirements
`contracts_and_wiring`	L2+L3	Typed function stubs + import wiring per file, merged in one pass (saves ~15K tokens vs separate calls)
`contracts`	L2	Typed stubs only (use `contracts_and_wiring` instead for most cases)
`wiring`	L3	Import connections only (use `contracts_and_wiring` instead for most cases)
`logic`	L4	Instructions for implementing one function body — call once per `NotImplementedError` stub
`get_challenge_criteria`	Review	Archetype-specific quality checklist (30+ criteria for `react_spa`) with review format instructions
`digest`	Review	Compact project summary (~200 lines) for efficient self-challenge without re-reading every file
`validate`	Check	Structural validation: syntax, lint, types, imports — returns pass/fail with details
`list_archetypes`	Info	Names and descriptions of all available archetypes
`get_archetype`	Info	Full archetype config: tech stack, validation rules, challenge criteria, infrastructure files
`trace_summary`	Info	Summary of the last generation: LLM calls, tokens, cost

Full-pipeline tools (require a separate LLM API key configured on the server):

Tool	What It Does
`generate`	Runs the entire L1→L2→L3→L4→validate→challenge pipeline using AgentGuard's internal LLM
`challenge`	LLM-based self-review using AgentGuard's internal LLM

When to use which: If your MCP host is already an LLM (Claude Desktop, Cursor, etc.), use the agent-native tools — they're free and the host LLM does better work when it follows the structured prompts itself. Use generate/challenge only if your MCP client is a thin script without its own LLM.

SSE Transport (for remote MCP clients)

# Default: stdio (for local AI tools)
agentguard mcp-serve

# SSE transport (for network/remote clients)
agentguard mcp-serve --transport sse --port 8421

Works With Any Agent Framework

AgentGuard integrates with your existing tooling — it's not a framework, it's infrastructure:

Framework	Integration
LangGraph	Python nodes for each pipeline step
CrewAI	Python tools for generation + validation
OpenHands	Python micro-agent integration
Raw Python	No framework needed — direct library import
TypeScript / Go / Rust / Any	HTTP server + thin SDK
Claude Desktop / Cursor / Windsurf / Cline	MCP server — zero integration code

Core Modules

Module	What It Does	Use Standalone?
Top-Down Generator	L1 skeleton → L2 contracts → L3 wiring → L4 logic	✅
Structural Validator	Syntax, lint, types, imports — zero-cost mechanical checks	✅
Self-Challenger	LLM reviews its own output against acceptance criteria	✅
Context Recipes	Right context, right amount, right time — anti-hallucination	✅
Archetypes	Project blueprints that configure the entire pipeline	✅
Tracing	Every LLM call tracked with cost, tokens, and quality metrics	✅

Every module works independently. Use the full pipeline or pick individual pieces.

Benchmarks: MCP vs No-MCP Code Generation

We ran controlled comparisons generating the same project with and without the MCP pipeline, using the same LLM (Claude) in both cases. The pipeline doesn't make the LLM smarter — it makes it more disciplined.

Test Projects

Project	Spec	Domain Complexity
Health Agenda	Patient scheduling + medication tracking + alerts	Medium (3 domains)
Whitelabel Ecommerce	i18n, seller onboarding, promo engine, pricing, search, checkout	High (8+ domains)

Build Metrics

Metric	Health MCP	Health No-MCP	Ecom MCP	Ecom No-MCP
Files	23	14	38	30
Lines of code	1,907	998	5,548	3,324
TypeScript errors	0	0	0	0
Vite build errors	0	0	0	0
Code-split chunks	—	—	16	1

Self-Challenge Results (Ecommerce — 36 Criteria)

Result	MCP	No-MCP
PASS	24/36 (67%)	23/36 (64%)
FAIL	12/36	13/36

Both versions share 9 common failures (magic numbers, DRY violations, inline styles, etc.). The key difference is in what each version fails at:

MCP passed, No-MCP failed: async-compatible data layer, ErrorBoundary exists, loading/error states, fuller i18n coverage
No-MCP passed, MCP failed: better context splitting (3 focused contexts vs 1 god-context)

Enterprise Readiness

Criterion	MCP	No-MCP
Type safety	8/10	7/10
Modularity	8/10	5/10
Maintainability	6/10	5/10
Accessibility	5/10	4/10
i18n readiness	6/10	5/10
Performance	8/10	5/10
Observability	4/10	2/10
Testability	5/10	4/10
Overall	6.3/10	4.6/10

Operational Readiness

Dimension	MCP	No-MCP	Details
Debuggability	8/10	4/10	MCP has structured logger, ErrorBoundary, pure reducer (action-traceable). No-MCP has no logging, no error boundary, opaque `useState` callbacks.
Feature extensibility	7/10	5/10	MCP's 6-layer architecture (types → utils → contexts → hooks → components → pages) with injectable function signatures. No-MCP has data-layer coupling — `validatePromo` imports seed at module scope.
Cloud scalability	8/10	4/10	MCP code-splits into 16 chunks (lazy per page), has centralized logger for Sentry/Datadog swap, constants file for feature flags. No-MCP ships a 240KB monolithic bundle, has zero logging, no error isolation.
API migration cost	6/10	3/10	MCP utils take data as arguments (`searchProducts(products, query)`) — injectable. No-MCP bakes `PRODUCTS.find()` into cart context computed values.
Test surface	8/10	5/10	MCP has 14+ pure functions testable without React rendering, plus an exportable reducer. No-MCP has 9+ but several have module-level seed imports baked in.
Team onboarding	7/10	6/10	MCP's layered DAG lets devs own a layer. No-MCP's flatter structure is simpler but offers less parallel work boundaries.

What the MCP Pipeline Generates That No-MCP Skips

Infrastructure	MCP	No-MCP	Why It Matters
ErrorBoundary	✅	❌	Without it, one page crash white-screens the whole app
Structured logger	✅	❌	Swap one file to connect Sentry/Datadog/CloudWatch
Code-splitting	✅	❌	207KB initial load vs 240KB; independent chunk cache invalidation
Async hook (`useAsync`)	✅	❌	Loading/error states handled; ready for real API calls
Toast notification system	✅	❌	User feedback for every state mutation
Constants file	✅	❌	Natural home for feature flags and env-var extraction
Route constants	✅	❌	Change a URL in one place, not grep across files

Key Insight

The MCP pipeline's value isn't in the features it builds — both versions deliver the same checkout, search, and onboarding flows. The value is in the invisible infrastructure it systematically generates: error boundaries, structured logging, code-splitting, pure utility extraction, injectable function signatures, and centralized constants.

These are exactly the things that matter when you go from "it works on my laptop" to "it runs in production at scale." A solo dev building a prototype gets there faster without MCP. But the moment you need a second developer, a staging environment, or a Sentry integration, MCP's infrastructure pays for itself.

The Gap Narrows With Complexity

Metric	Health (MCP / No-MCP)	Ecommerce (MCP / No-MCP)
Line ratio	1.9×	1.7×
Enterprise score	7.5 / 4.5	6.3 / 4.6
First-compile errors	0 / 0	0 / 0

As projects grow more complex, the No-MCP agent produces proportionally more code (it can't avoid complexity). But the MCP pipeline's disciplined structure still delivers measurably higher enterprise quality and significantly better operational readiness.

Demo Projects

The benchmarks above were produced from the following projects, all generated by AgentGuard's MCP pipeline (and a no-MCP baseline for comparison). You can regenerate them yourself:

# Generate the ecommerce SPA via MCP tools
agentguard generate --archetype react_spa \
  --spec "Whitelabel ecommerce SPA with i18n, seller onboarding, promo engine, pricing, search, checkout" \
  --model claude-sonnet-4-20250514

# Then validate and self-challenge the output
agentguard validate ./output --archetype react_spa
agentguard challenge ./output --archetype react_spa

Project	Description
Chess	Interactive chess game — MCP pipeline demo
Health Agenda (MCP)	Patient scheduling + medication tracking + alerts — MCP-generated
Health Agenda (No-MCP)	Same spec — direct generation baseline
Ecommerce (MCP)	Whitelabel ecommerce SPA — MCP-generated (38 files, 5,548 lines)
Ecommerce (No-MCP)	Same spec — direct generation baseline (30 files, 3,324 lines)

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

rlabs

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.15.0

Apr 24, 2026

0.14.0

Apr 20, 2026

0.13.0

Apr 20, 2026

0.12.3

Apr 20, 2026

0.12.2

Apr 12, 2026

0.12.1

Apr 12, 2026

0.12.0

Apr 10, 2026

0.11.1

Apr 10, 2026

0.11.0

Apr 9, 2026

0.10.0

Apr 7, 2026

0.9.0

Apr 5, 2026

0.7.0

Mar 20, 2026

0.6.1

Mar 17, 2026

0.6.0

Mar 17, 2026

0.5.0

Mar 9, 2026

0.4.0

Mar 7, 2026

0.3.2

Feb 25, 2026

0.3.1

Feb 24, 2026

0.3.0

Feb 23, 2026

0.2.0

Feb 22, 2026

This version

0.1.0

Feb 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlabs_agentguard-0.1.0.tar.gz (174.0 kB view details)

Uploaded Feb 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rlabs_agentguard-0.1.0-py3-none-any.whl (112.8 kB view details)

Uploaded Feb 21, 2026 Python 3

File details

Details for the file rlabs_agentguard-0.1.0.tar.gz.

File metadata

Download URL: rlabs_agentguard-0.1.0.tar.gz
Upload date: Feb 21, 2026
Size: 174.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rlabs_agentguard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e24b503ccd1f7d68f34b91c6dca9f931b13ed7a4c845ba97b1bde76d2b19fc6c`
MD5	`dda19cf52518f9b6c9ec4363aa931f27`
BLAKE2b-256	`470da8c3dd8dcd9c0fa4552d36aec24996141170b32bdfa6bbbb714257598d7b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rlabs_agentguard-0.1.0.tar.gz:

Publisher: publish-pypi.yml on rlabs-cl/AgentGuard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rlabs_agentguard-0.1.0.tar.gz
- Subject digest: e24b503ccd1f7d68f34b91c6dca9f931b13ed7a4c845ba97b1bde76d2b19fc6c
- Sigstore transparency entry: 975315472
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: rlabs-cl/AgentGuard@53e0c82b5794cd3124d0627e8ee62a6549a4ff8d
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/rlabs-cl
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@53e0c82b5794cd3124d0627e8ee62a6549a4ff8d
- Trigger Event: release

File details

Details for the file rlabs_agentguard-0.1.0-py3-none-any.whl.

File metadata

Download URL: rlabs_agentguard-0.1.0-py3-none-any.whl
Upload date: Feb 21, 2026
Size: 112.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rlabs_agentguard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d9e1b37c3ecf6e452dd5347c9d518f2f843bc1c21701d2400b0a7ffac65f519e`
MD5	`e6d814121f52e8ac99010bd640380d94`
BLAKE2b-256	`182025fe2d570b3f88cce45a79ad73b5ebdef4834b5ee66e654283ac177f791c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rlabs_agentguard-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on rlabs-cl/AgentGuard

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rlabs_agentguard-0.1.0-py3-none-any.whl
- Subject digest: d9e1b37c3ecf6e452dd5347c9d518f2f843bc1c21701d2400b0a7ffac65f519e
- Sigstore transparency entry: 975315474
- Sigstore integration time: Feb 21, 2026
Source repository:
- Permalink: rlabs-cl/AgentGuard@53e0c82b5794cd3124d0627e8ee62a6549a4ff8d
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/rlabs-cl
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@53e0c82b5794cd3124d0627e8ee62a6549a4ff8d
- Trigger Event: release

rlabs-agentguard 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

AgentGuard

What It Does

Installation

Optional LLM providers

Verify installation

How It Works

Archetypes

Usage

1. CLI — Generate from the command line

CLI Commands Reference

2. Python Library — Direct import

Using individual modules

Supported LLMs

3. HTTP Server — For non-Python agents

4. MCP Server — For AI coding tools (recommended)

MCP Tools Reference

SSE Transport (for remote MCP clients)

Works With Any Agent Framework

Core Modules

Benchmarks: MCP vs No-MCP Code Generation

Test Projects

Build Metrics

Self-Challenge Results (Ecommerce — 36 Criteria)

Enterprise Readiness

Operational Readiness

What the MCP Pipeline Generates That No-MCP Skips

Key Insight

The Gap Narrows With Complexity

Demo Projects

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance