CLI, MCP server, and JSON schemas for validating and auditing strategic-risk AI agent output

These details have not been verified by PyPI

Project links

Project description

Agenda Intelligence MD

CI / MCP / EVIDENCE-AUDIT LAYER FOR STRATEGIC INTELLIGENCE AGENTS — protocol, JSON schemas, CLI and MCP server for validating, scoring and auditing the structure of strategic-risk agent output. The evidence-discipline surface for markdown-first reasoning skills (Global Think Tank Analyst, Central Asia + Caspian, Gulf + Middle East). Open-source.

Evidence & eval layer for strategic intelligence agents.

A protocol, JSON-schema set, CLI, and MCP-compatible toolkit that helps AI agents move from unsupported summaries to auditable strategic-risk briefs:

what changed
why it matters
what is evidence-backed
what is uncertain
who gains or loses leverage
what scenarios are plausible
what to watch next

It is built for engineers shipping policy, sanctions, regulation, geopolitical-risk, market-risk, and strategic-intelligence agents — where the output has to survive review by an analyst, not just sound plausible.

Bundled-example baseline (5 cases, reproduced with python3 evals/run_benchmark.py):

metric	value
mean score	87.0 / 100
cases	5 (EU AI Act, EU CBAM, Red Sea shipping, sanctions routing, BIS AI Diffusion)
schema-valid	100%
with evidence pack	100%
with claim-level audit	100%
orphan evidence refs	0

What this is

Markdown protocol (Agenda-Intelligence.md) — a structured reasoning workflow agents can follow.
JSON schemas — validate brief structure, evidence packs, memory cards, lens manifests.
CLI checks — validate-brief, validate-evidence, score, doctor for CI-style validation of agent output.
MCP server — a real stdio MCP server (agenda-intelligence-mcp) exposing the validation, read, and scoring tools.
Eval starter kit — rubric, LLM-judge prompt, human checklist, sample cases, benchmark seed.
Source / evidence policy — explicit rules for claim-level discipline, including per-claim provenance tags (Axis A: [primary] [secondary] [user-provided] [inference] [analyst-judgment]; Axis B: [verify] [stale-risk: YYYY-MM]). See skills/agenda-intelligence/references/evidence-discipline.md.
Signal lifecycle tracker — markdown + JSON schema for tracking signals across sessions (detected → developing → escalated → stable → resolved → archived). See skills/agenda-intelligence/references/signal-lifecycle.md and schemas/signal-tracker.schema.json.
Source normalization skill (skills/source-ingest/) — normalize documents (PDF, DOCX, URL) into structured source records for evidence packs.
Regional & sector lenses — compact reference packs inside the protocol (Central Asia & Caspian, Middle East, EU; sanctions, export controls). For deep regional analysis, use the dedicated vertical specialist skills: Central Asia + Caspian or Gulf + Middle East.

Where this sits in the production AI stack

Reasoning skills (markdown-first reasoning contracts for agents):

Global Think Tank Analyst — horizontal: policy, sanctions, regulatory, geopolitical, trade memos
Central Asia + Caspian Hybrid Intelligence Skill — vertical: sanctions, AML, banking, corridor risk in Central Asia / Caspian
Gulf + Middle East Hybrid Intelligence Skill — vertical: Iran sanctions, GCC banking, sovereign wealth, maritime chokepoint risk

Evidence & audit layer (CI / MCP / schemas):

→ Agenda Intelligence MD (this repo) — validate, score and audit strategic-risk agent output structure

The skills define how agents reason. Agenda Intelligence MD defines how the output is audited. Together they let agents produce auditable strategic-intelligence — not just plausible-sounding summaries.

What this is not

Not a factuality verifier. It does not check whether claims are true. It checks whether they are structurally sound, evidence-labeled, and decision-shaped.
Not an autonomous news agent. It does not crawl, retrieve, or rank sources by itself.
Not a source retriever. Live retrieval is not implemented.
Not a replacement for analyst judgment. Pass/fail signals tell you form, not substance.
Not a guarantee of correctness. It surfaces missing evidence and uncertainty hooks; it does not guarantee them.
Not a mature benchmark suite yet. The benchmark seed in evals/benchmark_set.json is a starting point, not validated results.

60-second quickstart

# From PyPI
pip install agenda-intelligence-md
# Or pinned wheel:
# pip install https://github.com/vassiliylakhonin/agenda-intelligence-md/releases/download/v0.7.4/agenda_intelligence_md-0.7.4-py3-none-any.whl

# 1. Get a source plan for a domain
agenda-intelligence start technology-ai

# 2. Validate an agent-produced brief against the schema
agenda-intelligence validate-brief examples/agenda-brief.json

# 3. Score the brief (heuristic 0-100 structural rubric)
agenda-intelligence score examples/agenda-brief.json

# 4. Score with evidence-linked feedback
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json

# 5. Run the structural bench across all bundled examples
agenda-intelligence bench examples/source-backed --strict --min-score 80

# 6. Diagnose local install + MCP tool surface
agenda-intelligence doctor

# 7. Print local MCP client config
agenda-intelligence mcp-config --client cursor

Expected scoring output:

score: 90/100
note: Heuristic structural/evidence-discipline score; does not verify factual truthfulness.
evidence_support: ... claims supported: 1/1 supported ...

Flagship example: EU AI Act

A weak baseline summary vs. an Agenda-Intelligence-MD brief, plus the evidence pack used to back each claim.

Brief: examples/source-backed/eu-ai-act.md
Schema-valid JSON brief: examples/source-backed/eu-ai-act.brief.json
Evidence pack (illustrative — placeholder URLs, not live citations): examples/source-backed/eu-ai-act.evidence.json
Claim-level audit: examples/source-backed/eu-ai-act.audit.json
Before/after pair: examples/before-after/

The evidence URLs in flagship examples are illustrative placeholders. The point is the shape of evidence-backed reasoning, not live citations.

Run the full pipeline on this example:

agenda-intelligence validate-brief examples/source-backed/eu-ai-act.brief.json
agenda-intelligence validate-evidence examples/source-backed/eu-ai-act.evidence.json
agenda-intelligence audit-claims examples/source-backed/eu-ai-act.audit.json --strict
agenda-intelligence score examples/source-backed/eu-ai-act.brief.json --evidence examples/source-backed/eu-ai-act.evidence.json --min-score 80

Before / after (sketch)

	Baseline LLM	Agenda-Intelligence-MD
Output shape	Free-text summary	Schema-valid brief
Claims	Implicit	Explicit, classified
Evidence	Mixed in / absent	Separate evidence pack
Uncertainty	Often missing	Required field
Watch-next	Often missing	Required, ≥1 indicator
Schema validation	N/A	`validate-brief` pass/fail
Evidence audit	N/A	`validate-evidence` pass/fail
Heuristic score	N/A	`score` 0–100

CLI

agenda-intelligence start <category>            # source plan + brief template
agenda-intelligence validate-brief <brief.json>
agenda-intelligence validate-evidence <pack.json>
agenda-intelligence audit-claims <claims.json> [--format json] [--strict]
agenda-intelligence score <brief.json> [--evidence <pack.json>] [--format json] [--min-score N]
agenda-intelligence score <before-after.md>
agenda-intelligence bench <dir>                  # validate + audit + score across a case directory
agenda-intelligence verify-quotes <pack.json>
agenda-intelligence source-plan <category>
agenda-intelligence list-lenses [--type ...]
agenda-intelligence get-lens <type> <id>
agenda-intelligence get-protocol <name>
agenda-intelligence validate-manifest
agenda-intelligence memory-search <query>
agenda-intelligence mcp-config [--client cursor|codex|claude-desktop]
agenda-intelligence doctor [--json]
agenda-intelligence --version

MCP

MCP as distribution surface. MCP turns the validation, audit and scoring tools into agent-consumable functions, not just CLI commands. Any MCP-compatible host (Claude Desktop, Cursor, Codex, custom agents) can call them as tools inside the agent loop — no separate CI step, no copy-paste between systems. The markdown-first reasoning skills define how memos are reasoned; this layer is where their output gets validated and audited without leaving the agent.

The package ships a real stdio MCP server, agenda-intelligence-mcp, plus small Python tool functions in agenda_intelligence.mcp_server. See MCP.md and docs/integrations/mcp.md.

Implemented MCP tools (all verified by scripts/smoke_mcp.py):

validate_brief(brief_json) — schema check
validate_evidence(evidence_json) — schema check
audit_claims(audit_json) — claim-level evidence audit
get_protocol(name) — return packaged protocol markdown
list_lenses(lens_type=None) — read from manifest
get_lens(lens_type, lens_id) — return packaged lens markdown
source_plan(category) — return source requirements
score_output(before_text, after_text) — heuristic structure / decision-readiness score

MCP verification status: wire-protocol verified — scripts/smoke_mcp.py exercises the full JSON-RPC cycle (initialize → tools/list → tools/call) against the running stdio server. See MCP.md.

Live source retrieval is not implemented.

Example agent flow

Agent receives a policy/risk update.
Agent calls source_plan for the relevant category.
Agent drafts a brief in the protocol shape.
Agent calls validate_brief and validate_evidence.
Agent calls score_output for a decision-readiness signal.
Agent returns the brief, with explicit uncertainty and watch-next.

CI / checking concept

validate-brief and validate-evidence behave like linters: zero exit on success, non-zero on failure, errors on stderr. Drop them into any CI pipeline that produces strategic briefs from agents:

agenda-intelligence validate-brief examples/agenda-brief.json
agenda-intelligence validate-evidence examples/source/evidence-pack.json
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json --min-score 70

Architecture

flowchart LR
  Agent[Strategic-intelligence agent] -->|drafts| Brief[Agenda brief JSON]
  Agent -->|cites| Evidence[Evidence pack JSON]
  Brief --> Check[validate-brief]
  Evidence --> Audit[validate-evidence]
  Brief --> Score[score]
  Evidence --> Score
  P[Agenda-Intelligence.md] -.guides.-> Agent
  L[regional/sector lenses] -.guides.-> Agent
  S[source requirements] -.guides.-> Agent

Schemas

Schema	Purpose
`agenda-brief.schema.json`	Brief structure
`evidence-pack.schema.json`	Evidence pack structure
`signal-classification.schema.json`	Signal taxonomy
`memory-card.schema.json`	AnalysisBank cards
`lens-manifest.schema.json`	Lens manifest
`evidence-audit.schema.json`	Claim-level evidence audit
`signal-tracker.schema.json`	Signal lifecycle tracker

Evidence audit

Each important claim should be traceable:

{
  "claim_id": "c1",
  "claim": "EU AI Act tightens obligations on high-risk systems.",
  "claim_type": "regulatory_change",
  "evidence_ids": ["e1", "e2"],
  "support_level": "direct",
  "uncertainty": "Enforcement timeline per sector unclear.",
  "risk_if_wrong": "Compliance plans miss deadline."
}

support_level is one of direct | partial | weak | unsupported. This schema is not wired into validate-evidence by default; use audit-claims directly.

Evals

See docs/evaluation.md for the full layer breakdown.

Key honesty rule:

Current scoring does not verify factual truth. It evaluates structure, completeness, evidence labeling, and decision-readiness signals.

Bundled-example baseline: mean 87.0/100, 5 cases, 100% schema-valid, 0 orphan refs. Reproduce with python evals/run_benchmark.py. Human-judge benchmarking is not done yet.

Status

Component	Status
Markdown protocol	Stable
JSON schemas (brief, evidence, lens, memory, signal)	Stable
CLI: validate-*, score, start, source-plan, doctor, mcp-config	Stable
Lenses (Central Asia, Middle East, EU; sanctions, export controls)	Stable
MCP stdio server (`agenda-intelligence-mcp`)	Stable
MCP tool functions (validate / read / score / audit_claims)	Stable
Evidence-audit schema (claim-level)	Stable
Signal-tracker schema (lifecycle)	Stable
Live source retrieval	Not implemented
Heuristic benchmark baseline (5 bundled cases)	Produced — mean 87.0/100
Human-judge benchmark results	Not produced yet
Factual-truth verification	Not in scope today

Limitations

No factual verification. The toolkit checks form, not truth.
No live source retrieval. Evidence packs are user- or agent-supplied.
Scoring is heuristic. The rubric is documented; an LLM-judge prompt is provided; results are not benchmarked yet.
Lens coverage is intentionally narrow.

Contributing eval cases

The most valuable contribution is a case: a real public event with a baseline agent output, a target brief, and a human checklist. See CONTRIBUTING.md and evals/cases/.

Repository layout

agenda-intelligence-md/
├─ src/agenda_intelligence/   # Python package (CLI + MCP server + tools)
├─ schemas/                   # JSON schemas
├─ examples/                  # briefs, evidence packs, before/after
├─ analysis-bank/             # reusable reasoning patterns (memory cards)
├─ evals/                     # rubric, judge prompt, checklist, cases
├─ docs/                      # guides, integrations, use-cases
├─ skills/agenda-intelligence/# OpenClaw skill wrapper
├─ skills/source-ingest/      # Source normalization skill (PDF/DOCX/URL → structured source record)
└─ tests/                     # pytest suite

Documentation

Resource	Link
Quickstart	`docs/quickstart.md`
End-to-end tutorial	`docs/tutorial.md`
Evaluation	`docs/evaluation.md`
Evidence audit	`docs/evidence-audit.md`
Agent integration sketch	`docs/integrations/agent-loop.md`
Use-cases	`docs/use-cases/`
Integrations	`docs/integrations/`
Roadmap	`ROADMAP.md`
Changelog	`CHANGELOG.md`

License

MIT.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.2

May 31, 2026

1.0.1

May 23, 2026

1.0.0

May 23, 2026

1.0.0rc1 pre-release

May 23, 2026

0.9.3

May 22, 2026

0.9.2

May 21, 2026

0.9.1

May 21, 2026

0.9.0

May 20, 2026

0.8.2

May 20, 2026

0.8.1

May 20, 2026

0.8.0

May 20, 2026

0.7.5

May 20, 2026

This version

0.7.4

May 17, 2026

0.7.3

May 14, 2026

0.7.2

May 12, 2026

0.7.1

May 6, 2026

0.7.0

May 6, 2026

0.6.1

May 6, 2026

0.6.0

May 6, 2026

0.5.5

May 6, 2026

0.5.4

May 6, 2026

0.5.3

May 6, 2026

0.5.2

May 5, 2026

0.5.1

May 4, 2026

0.5.0

May 4, 2026

0.4.8

May 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenda_intelligence_md-0.7.4.tar.gz (130.5 kB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agenda_intelligence_md-0.7.4-py3-none-any.whl (72.3 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file agenda_intelligence_md-0.7.4.tar.gz.

File metadata

Download URL: agenda_intelligence_md-0.7.4.tar.gz
Upload date: May 17, 2026
Size: 130.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agenda_intelligence_md-0.7.4.tar.gz
Algorithm	Hash digest
SHA256	`c796cd5f76d521adad6b588c9a7a42766f828b541d0f3c4778d45cacce07b07e`
MD5	`e7e641a408cf65fedcf71ffede807c78`
BLAKE2b-256	`6ff839c501238a95f32422172ec6590ed5052eb17a5300e815d49d35a7fe63ca`

See more details on using hashes here.

File details

Details for the file agenda_intelligence_md-0.7.4-py3-none-any.whl.

File metadata

Download URL: agenda_intelligence_md-0.7.4-py3-none-any.whl
Upload date: May 17, 2026
Size: 72.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agenda_intelligence_md-0.7.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d4347858915f627fdd63877775ce57a0f2893660742c442c8aee9959a465c045`
MD5	`dc1499ab27a76b6cf0da36f125b43bb4`
BLAKE2b-256	`32a9587571974b9ef0e5d90526ccaca61d46f7cd1acab350c91d3421d3b95189`

See more details on using hashes here.

agenda-intelligence-md 0.7.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agenda Intelligence MD

What this is

What this is not

60-second quickstart

Flagship example: EU AI Act

Before / after (sketch)

CLI

MCP

Example agent flow

CI / checking concept

Architecture

Schemas

Evidence audit

Evals

Status

Limitations

Contributing eval cases

Repository layout

Documentation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes