CLI, MCP server, and JSON schemas for validating and auditing strategic-risk AI agent output
Project description
Agenda Intelligence MD
CI / MCP / EVIDENCE-AUDIT LAYER FOR STRATEGIC INTELLIGENCE AGENTS — protocol, JSON schemas, CLI and MCP server for validating, scoring and auditing the structure of strategic-risk agent output. The evidence-discipline surface for markdown-first reasoning skills (Global Think Tank Analyst, Central Asia + Caspian, Gulf + Middle East). Open-source.
Evidence & eval layer for strategic intelligence agents.
A protocol, JSON-schema set, CLI, and MCP-compatible toolkit that helps AI agents move from unsupported summaries to auditable strategic-risk briefs:
- what changed
- why it matters
- what is evidence-backed
- what is uncertain
- who gains or loses leverage
- what scenarios are plausible
- what to watch next
It is built for engineers shipping policy, sanctions, regulation, geopolitical-risk, market-risk, and strategic-intelligence agents — where the output has to survive review by an analyst, not just sound plausible.
Bundled-example baseline (5 cases, reproduced with python3 evals/run_benchmark.py):
| metric | value |
|---|---|
| mean score | 87.0 / 100 |
| cases | 5 (EU AI Act, EU CBAM, Red Sea shipping, sanctions routing, BIS AI Diffusion) |
| schema-valid | 100% |
| with evidence pack | 100% |
| with claim-level audit | 100% |
| orphan evidence refs | 0 |
What this is
- Markdown protocol (
Agenda-Intelligence.md) — a structured reasoning workflow agents can follow. - JSON schemas — validate brief structure, evidence packs, memory cards, lens manifests.
- CLI checks —
validate-brief,validate-evidence,score,doctorfor CI-style validation of agent output. - MCP server — a real stdio MCP server (
agenda-intelligence-mcp) exposing the validation, read, and scoring tools. - Eval starter kit — rubric, LLM-judge prompt, human checklist, sample cases, benchmark seed.
- Source / evidence policy — explicit rules for claim-level discipline, including per-claim provenance tags (Axis A:
[primary][secondary][user-provided][inference][analyst-judgment]; Axis B:[verify][stale-risk: YYYY-MM]). Seeskills/agenda-intelligence/references/evidence-discipline.md. - Signal lifecycle tracker — markdown + JSON schema for tracking signals across sessions (detected → developing → escalated → stable → resolved → archived). See
skills/agenda-intelligence/references/signal-lifecycle.mdandschemas/signal-tracker.schema.json. - Source normalization skill (
skills/source-ingest/) — normalize documents (PDF, DOCX, URL) into structured source records for evidence packs. - Regional & sector lenses — compact reference packs inside the protocol (Central Asia & Caspian, Middle East, EU; sanctions, export controls). For deep regional analysis, use the dedicated vertical specialist skills: Central Asia + Caspian or Gulf + Middle East.
Where this sits in the production AI stack
Reasoning skills (markdown-first reasoning contracts for agents):
- Global Think Tank Analyst — horizontal: policy, sanctions, regulatory, geopolitical, trade memos
- Central Asia + Caspian Hybrid Intelligence Skill — vertical: sanctions, AML, banking, corridor risk in Central Asia / Caspian
- Gulf + Middle East Hybrid Intelligence Skill — vertical: Iran sanctions, GCC banking, sovereign wealth, maritime chokepoint risk
Evidence & audit layer (CI / MCP / schemas):
- → Agenda Intelligence MD (this repo) — validate, score and audit strategic-risk agent output structure
The skills define how agents reason. Agenda Intelligence MD defines how the output is audited. Together they let agents produce auditable strategic-intelligence — not just plausible-sounding summaries.
What this is not
- Not a factuality verifier. It does not check whether claims are true. It checks whether they are structurally sound, evidence-labeled, and decision-shaped.
- Not an autonomous news agent. It does not crawl, retrieve, or rank sources by itself.
- Not a source retriever. Live retrieval is not implemented.
- Not a replacement for analyst judgment. Pass/fail signals tell you form, not substance.
- Not a guarantee of correctness. It surfaces missing evidence and uncertainty hooks; it does not guarantee them.
- Not a mature benchmark suite yet. The benchmark seed in
evals/benchmark_set.jsonis a starting point, not validated results.
60-second quickstart
# From PyPI
pip install agenda-intelligence-md
# Or pinned wheel:
# pip install https://github.com/vassiliylakhonin/agenda-intelligence-md/releases/download/v0.7.4/agenda_intelligence_md-0.7.4-py3-none-any.whl
# 1. Get a source plan for a domain
agenda-intelligence start technology-ai
# 2. Validate an agent-produced brief against the schema
agenda-intelligence validate-brief examples/agenda-brief.json
# 3. Score the brief (heuristic 0-100 structural rubric)
agenda-intelligence score examples/agenda-brief.json
# 4. Score with evidence-linked feedback
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json
# 5. Run the structural bench across all bundled examples
agenda-intelligence bench examples/source-backed --strict --min-score 80
# 6. Diagnose local install + MCP tool surface
agenda-intelligence doctor
# 7. Print local MCP client config
agenda-intelligence mcp-config --client cursor
Expected scoring output:
score: 90/100
note: Heuristic structural/evidence-discipline score; does not verify factual truthfulness.
evidence_support: ... claims supported: 1/1 supported ...
Flagship example: EU AI Act
A weak baseline summary vs. an Agenda-Intelligence-MD brief, plus the evidence pack used to back each claim.
- Brief:
examples/source-backed/eu-ai-act.md - Schema-valid JSON brief:
examples/source-backed/eu-ai-act.brief.json - Evidence pack (illustrative — placeholder URLs, not live citations):
examples/source-backed/eu-ai-act.evidence.json - Claim-level audit:
examples/source-backed/eu-ai-act.audit.json - Before/after pair:
examples/before-after/
The evidence URLs in flagship examples are illustrative placeholders. The point is the shape of evidence-backed reasoning, not live citations.
Run the full pipeline on this example:
agenda-intelligence validate-brief examples/source-backed/eu-ai-act.brief.json
agenda-intelligence validate-evidence examples/source-backed/eu-ai-act.evidence.json
agenda-intelligence audit-claims examples/source-backed/eu-ai-act.audit.json --strict
agenda-intelligence score examples/source-backed/eu-ai-act.brief.json --evidence examples/source-backed/eu-ai-act.evidence.json --min-score 80
Before / after (sketch)
| Baseline LLM | Agenda-Intelligence-MD | |
|---|---|---|
| Output shape | Free-text summary | Schema-valid brief |
| Claims | Implicit | Explicit, classified |
| Evidence | Mixed in / absent | Separate evidence pack |
| Uncertainty | Often missing | Required field |
| Watch-next | Often missing | Required, ≥1 indicator |
| Schema validation | N/A | validate-brief pass/fail |
| Evidence audit | N/A | validate-evidence pass/fail |
| Heuristic score | N/A | score 0–100 |
CLI
agenda-intelligence start <category> # source plan + brief template
agenda-intelligence validate-brief <brief.json>
agenda-intelligence validate-evidence <pack.json>
agenda-intelligence audit-claims <claims.json> [--format json] [--strict]
agenda-intelligence score <brief.json> [--evidence <pack.json>] [--format json] [--min-score N]
agenda-intelligence score <before-after.md>
agenda-intelligence bench <dir> # validate + audit + score across a case directory
agenda-intelligence verify-quotes <pack.json>
agenda-intelligence source-plan <category>
agenda-intelligence list-lenses [--type ...]
agenda-intelligence get-lens <type> <id>
agenda-intelligence get-protocol <name>
agenda-intelligence validate-manifest
agenda-intelligence memory-search <query>
agenda-intelligence mcp-config [--client cursor|codex|claude-desktop]
agenda-intelligence doctor [--json]
agenda-intelligence --version
MCP
MCP as distribution surface. MCP turns the validation, audit and scoring tools into agent-consumable functions, not just CLI commands. Any MCP-compatible host (Claude Desktop, Cursor, Codex, custom agents) can call them as tools inside the agent loop — no separate CI step, no copy-paste between systems. The markdown-first reasoning skills define how memos are reasoned; this layer is where their output gets validated and audited without leaving the agent.
The package ships a real stdio MCP server, agenda-intelligence-mcp, plus
small Python tool functions in agenda_intelligence.mcp_server. See
MCP.md and docs/integrations/mcp.md.
Implemented MCP tools (all verified by scripts/smoke_mcp.py):
validate_brief(brief_json)— schema checkvalidate_evidence(evidence_json)— schema checkaudit_claims(audit_json)— claim-level evidence auditget_protocol(name)— return packaged protocol markdownlist_lenses(lens_type=None)— read from manifestget_lens(lens_type, lens_id)— return packaged lens markdownsource_plan(category)— return source requirementsscore_output(before_text, after_text)— heuristic structure / decision-readiness score
MCP verification status: wire-protocol verified — scripts/smoke_mcp.py exercises the full JSON-RPC cycle (initialize → tools/list → tools/call) against the running stdio server. See MCP.md.
Live source retrieval is not implemented.
Example agent flow
- Agent receives a policy/risk update.
- Agent calls
source_planfor the relevant category. - Agent drafts a brief in the protocol shape.
- Agent calls
validate_briefandvalidate_evidence. - Agent calls
score_outputfor a decision-readiness signal. - Agent returns the brief, with explicit uncertainty and watch-next.
CI / checking concept
validate-brief and validate-evidence behave like linters: zero exit on
success, non-zero on failure, errors on stderr. Drop them into any CI
pipeline that produces strategic briefs from agents:
agenda-intelligence validate-brief examples/agenda-brief.json
agenda-intelligence validate-evidence examples/source/evidence-pack.json
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json --min-score 70
Architecture
flowchart LR
Agent[Strategic-intelligence agent] -->|drafts| Brief[Agenda brief JSON]
Agent -->|cites| Evidence[Evidence pack JSON]
Brief --> Check[validate-brief]
Evidence --> Audit[validate-evidence]
Brief --> Score[score]
Evidence --> Score
P[Agenda-Intelligence.md] -.guides.-> Agent
L[regional/sector lenses] -.guides.-> Agent
S[source requirements] -.guides.-> Agent
Schemas
| Schema | Purpose |
|---|---|
agenda-brief.schema.json |
Brief structure |
evidence-pack.schema.json |
Evidence pack structure |
signal-classification.schema.json |
Signal taxonomy |
memory-card.schema.json |
AnalysisBank cards |
lens-manifest.schema.json |
Lens manifest |
evidence-audit.schema.json |
Claim-level evidence audit |
signal-tracker.schema.json |
Signal lifecycle tracker |
Evidence audit
Each important claim should be traceable:
{
"claim_id": "c1",
"claim": "EU AI Act tightens obligations on high-risk systems.",
"claim_type": "regulatory_change",
"evidence_ids": ["e1", "e2"],
"support_level": "direct",
"uncertainty": "Enforcement timeline per sector unclear.",
"risk_if_wrong": "Compliance plans miss deadline."
}
support_level is one of direct | partial | weak | unsupported.
This schema is not wired into validate-evidence by default; use audit-claims directly.
Evals
See docs/evaluation.md for the full layer breakdown.
Key honesty rule:
Current scoring does not verify factual truth. It evaluates structure, completeness, evidence labeling, and decision-readiness signals.
Bundled-example baseline: mean 87.0/100, 5 cases, 100% schema-valid, 0 orphan refs.
Reproduce with python evals/run_benchmark.py. Human-judge benchmarking is not done yet.
Status
| Component | Status |
|---|---|
| Markdown protocol | Stable |
| JSON schemas (brief, evidence, lens, memory, signal) | Stable |
| CLI: validate-*, score, start, source-plan, doctor, mcp-config | Stable |
| Lenses (Central Asia, Middle East, EU; sanctions, export controls) | Stable |
MCP stdio server (agenda-intelligence-mcp) |
Stable |
| MCP tool functions (validate / read / score / audit_claims) | Stable |
| Evidence-audit schema (claim-level) | Stable |
| Signal-tracker schema (lifecycle) | Stable |
| Live source retrieval | Not implemented |
| Heuristic benchmark baseline (5 bundled cases) | Produced — mean 87.0/100 |
| Human-judge benchmark results | Not produced yet |
| Factual-truth verification | Not in scope today |
Limitations
- No factual verification. The toolkit checks form, not truth.
- No live source retrieval. Evidence packs are user- or agent-supplied.
- Scoring is heuristic. The rubric is documented; an LLM-judge prompt is provided; results are not benchmarked yet.
- Lens coverage is intentionally narrow.
Contributing eval cases
The most valuable contribution is a case: a real public event with a
baseline agent output, a target brief, and a human checklist. See
CONTRIBUTING.md and evals/cases/.
Repository layout
agenda-intelligence-md/
├─ src/agenda_intelligence/ # Python package (CLI + MCP server + tools)
├─ schemas/ # JSON schemas
├─ examples/ # briefs, evidence packs, before/after
├─ analysis-bank/ # reusable reasoning patterns (memory cards)
├─ evals/ # rubric, judge prompt, checklist, cases
├─ docs/ # guides, integrations, use-cases
├─ skills/agenda-intelligence/# OpenClaw skill wrapper
├─ skills/source-ingest/ # Source normalization skill (PDF/DOCX/URL → structured source record)
└─ tests/ # pytest suite
Documentation
| Resource | Link |
|---|---|
| Quickstart | docs/quickstart.md |
| End-to-end tutorial | docs/tutorial.md |
| Evaluation | docs/evaluation.md |
| Evidence audit | docs/evidence-audit.md |
| Agent integration sketch | docs/integrations/agent-loop.md |
| Use-cases | docs/use-cases/ |
| Integrations | docs/integrations/ |
| Roadmap | ROADMAP.md |
| Changelog | CHANGELOG.md |
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agenda_intelligence_md-0.7.4.tar.gz.
File metadata
- Download URL: agenda_intelligence_md-0.7.4.tar.gz
- Upload date:
- Size: 130.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c796cd5f76d521adad6b588c9a7a42766f828b541d0f3c4778d45cacce07b07e
|
|
| MD5 |
e7e641a408cf65fedcf71ffede807c78
|
|
| BLAKE2b-256 |
6ff839c501238a95f32422172ec6590ed5052eb17a5300e815d49d35a7fe63ca
|
File details
Details for the file agenda_intelligence_md-0.7.4-py3-none-any.whl.
File metadata
- Download URL: agenda_intelligence_md-0.7.4-py3-none-any.whl
- Upload date:
- Size: 72.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4347858915f627fdd63877775ce57a0f2893660742c442c8aee9959a465c045
|
|
| MD5 |
dc1499ab27a76b6cf0da36f125b43bb4
|
|
| BLAKE2b-256 |
32a9587571974b9ef0e5d90526ccaca61d46f7cd1acab350c91d3421d3b95189
|