Skip to main content

A drop-in markdown cognition layer for AI agents that need to analyze public agenda

Project description

Agenda Intelligence MD

Evidence & eval layer for strategic intelligence agents.

PyPI version License: MIT

A protocol, JSON-schema set, CLI, and MCP-compatible toolkit that helps AI agents move from unsupported summaries to auditable strategic-risk briefs:

  • what changed
  • why it matters
  • what is evidence-backed
  • what is uncertain
  • who gains or loses leverage
  • what scenarios are plausible
  • what to watch next

It is built for engineers shipping policy, sanctions, regulation, geopolitical-risk, market-risk, and strategic-intelligence agents — where the output has to survive review by an analyst, not just sound plausible.

Bundled-example baseline (4 cases, reproduced with python3 evals/run_benchmark.py):

metric value
mean score 86.8 / 100
cases 4 (EU AI Act, Red Sea shipping, sanctions routing, BIS AI Diffusion)
schema-valid 100%
with evidence pack 100%
with claim-level audit 100%
orphan evidence refs 0

What this is

  • Markdown protocol (Agenda-Intelligence.md) — a structured reasoning workflow agents can follow.
  • JSON schemas — validate brief structure, evidence packs, memory cards, lens manifests.
  • CLI checksvalidate-brief, validate-evidence, score, doctor for CI-style validation of agent output.
  • MCP server — a real stdio MCP server (agenda-intelligence-mcp) exposing the validation, read, and scoring tools.
  • Eval starter kit — rubric, LLM-judge prompt, human checklist, sample cases, benchmark seed.
  • Source / evidence policy — explicit rules for claim-level discipline, including per-claim provenance tags (Axis A: [primary] [secondary] [user-provided] [inference] [analyst-judgment]; Axis B: [verify] [stale-risk: YYYY-MM]). See skills/agenda-intelligence/references/evidence-discipline.md.
  • Signal lifecycle tracker — markdown + JSON schema for tracking signals across sessions (detected → developing → escalated → stable → resolved → archived). See skills/agenda-intelligence/references/signal-lifecycle.md and schemas/signal-tracker.schema.json.
  • Regional & sector lenses — compact reference packs inside the protocol (Central Asia & Caspian, Middle East, EU; sanctions, export controls). For deep regional analysis, use the dedicated vertical specialist skills: Central Asia + Caspian or Gulf + Middle East.

What this is not

  • Not a factuality verifier. It does not check whether claims are true. It checks whether they are structurally sound, evidence-labeled, and decision-shaped.
  • Not an autonomous news agent. It does not crawl, retrieve, or rank sources by itself.
  • Not a source retriever. Live retrieval is not implemented.
  • Not a replacement for analyst judgment. Pass/fail signals tell you form, not substance.
  • Not a guarantee of correctness. It surfaces missing evidence and uncertainty hooks; it does not guarantee them.
  • Not a mature benchmark suite yet. The benchmark seed in evals/benchmark_set.json is a starting point, not validated results.

60-second quickstart

# From PyPI
pip install agenda-intelligence-md
# Or pinned wheel:
# pip install https://github.com/vassiliylakhonin/agenda-intelligence-md/releases/download/v0.7.2/agenda_intelligence_md-0.7.2-py3-none-any.whl

# 1. Get a source plan for a domain
agenda-intelligence start technology-ai

# 2. Validate an agent-produced brief against the schema
agenda-intelligence validate-brief examples/agenda-brief.json

# 3. Score the brief (heuristic 0-100 structural rubric)
agenda-intelligence score examples/agenda-brief.json

# 4. Score with evidence-linked feedback
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json

# 5. Run the structural bench across all bundled examples
agenda-intelligence bench examples/source-backed --strict --min-score 80

# 6. Diagnose local install + MCP tool surface
agenda-intelligence doctor

# 7. Print local MCP client config
agenda-intelligence mcp-config --client cursor

Expected scoring output:

score: 90/100
note: Heuristic structural/evidence-discipline score; does not verify factual truthfulness.
evidence_support: ... claims supported: 1/1 supported ...

Flagship example: EU AI Act

A weak baseline summary vs. an Agenda-Intelligence-MD brief, plus the evidence pack used to back each claim.

The evidence URLs in flagship examples are illustrative placeholders. The point is the shape of evidence-backed reasoning, not live citations.

Run the full pipeline on this example:

agenda-intelligence validate-brief examples/source-backed/eu-ai-act.brief.json
agenda-intelligence validate-evidence examples/source-backed/eu-ai-act.evidence.json
agenda-intelligence audit-claims examples/source-backed/eu-ai-act.audit.json --strict
agenda-intelligence score examples/source-backed/eu-ai-act.brief.json --evidence examples/source-backed/eu-ai-act.evidence.json --min-score 80

Before / after (sketch)

Baseline LLM Agenda-Intelligence-MD
Output shape Free-text summary Schema-valid brief
Claims Implicit Explicit, classified
Evidence Mixed in / absent Separate evidence pack
Uncertainty Often missing Required field
Watch-next Often missing Required, ≥1 indicator
Schema validation N/A validate-brief pass/fail
Evidence audit N/A validate-evidence pass/fail
Heuristic score N/A score 0–100

CLI

agenda-intelligence start <category>            # source plan + brief template
agenda-intelligence validate-brief <brief.json>
agenda-intelligence validate-evidence <pack.json>
agenda-intelligence audit-claims <claims.json> [--format json] [--strict]
agenda-intelligence score <brief.json> [--evidence <pack.json>] [--format json] [--min-score N]
agenda-intelligence score <before-after.md>
agenda-intelligence bench <dir>                  # validate + audit + score across a case directory
agenda-intelligence verify-quotes <pack.json>
agenda-intelligence source-plan <category>
agenda-intelligence list-lenses [--type ...]
agenda-intelligence get-lens <type> <id>
agenda-intelligence get-protocol <name>
agenda-intelligence validate-manifest
agenda-intelligence memory-search <query>
agenda-intelligence mcp-config [--client cursor|codex|claude-desktop]
agenda-intelligence doctor [--json]
agenda-intelligence --version

MCP

The package ships a real stdio MCP server, agenda-intelligence-mcp, plus small Python tool functions in agenda_intelligence.mcp_server. See MCP.md and docs/integrations/mcp.md.

Implemented MCP tools (all verified by scripts/smoke_mcp.py):

  • validate_brief(brief_json) — schema check
  • validate_evidence(evidence_json) — schema check
  • audit_claims(audit_json) — claim-level evidence audit
  • get_protocol(name) — return packaged protocol markdown
  • list_lenses(lens_type=None) — read from manifest
  • get_lens(lens_type, lens_id) — return packaged lens markdown
  • source_plan(category) — return source requirements
  • score_output(before_text, after_text) — heuristic structure / decision-readiness score

MCP verification status: wire-protocol verified — scripts/smoke_mcp.py exercises the full JSON-RPC cycle (initialize → tools/list → tools/call) against the running stdio server. See MCP.md.

Live source retrieval is not implemented.

Example agent flow

  1. Agent receives a policy/risk update.
  2. Agent calls source_plan for the relevant category.
  3. Agent drafts a brief in the protocol shape.
  4. Agent calls validate_brief and validate_evidence.
  5. Agent calls score_output for a decision-readiness signal.
  6. Agent returns the brief, with explicit uncertainty and watch-next.

CI / checking concept

validate-brief and validate-evidence behave like linters: zero exit on success, non-zero on failure, errors on stderr. Drop them into any CI pipeline that produces strategic briefs from agents:

agenda-intelligence validate-brief examples/agenda-brief.json
agenda-intelligence validate-evidence examples/source/evidence-pack.json
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json --min-score 70

Architecture

flowchart LR
  Agent[Strategic-intelligence agent] -->|drafts| Brief[Agenda brief JSON]
  Agent -->|cites| Evidence[Evidence pack JSON]
  Brief --> Check[validate-brief]
  Evidence --> Audit[validate-evidence]
  Brief --> Score[score]
  Evidence --> Score
  P[Agenda-Intelligence.md] -.guides.-> Agent
  L[regional/sector lenses] -.guides.-> Agent
  S[source requirements] -.guides.-> Agent

Schemas

Schema Purpose
agenda-brief.schema.json Brief structure
evidence-pack.schema.json Evidence pack structure
signal-classification.schema.json Signal taxonomy
memory-card.schema.json AnalysisBank cards
lens-manifest.schema.json Lens manifest
evidence-audit.schema.json Claim-level evidence audit
signal-tracker.schema.json Signal lifecycle tracker

Evidence audit

Each important claim should be traceable:

{
  "claim_id": "c1",
  "claim": "EU AI Act tightens obligations on high-risk systems.",
  "claim_type": "regulatory_change",
  "evidence_ids": ["e1", "e2"],
  "support_level": "direct",
  "uncertainty": "Enforcement timeline per sector unclear.",
  "risk_if_wrong": "Compliance plans miss deadline."
}

support_level is one of direct | partial | weak | unsupported. This schema is not wired into validate-evidence by default; use audit-claims directly.


Evals

See docs/evaluation.md for the full layer breakdown.

Key honesty rule:

Current scoring does not verify factual truth. It evaluates structure, completeness, evidence labeling, and decision-readiness signals.

Bundled-example baseline: mean 86.8/100, 4 cases, 100% schema-valid, 0 orphan refs. Reproduce with python evals/run_benchmark.py. Human-judge benchmarking is not done yet.


Status

Component Status
Markdown protocol Stable
JSON schemas (brief, evidence, lens, memory, signal) Stable
CLI: validate-*, score, start, source-plan, doctor, mcp-config Stable
Lenses (Central Asia, Middle East, EU; sanctions, export controls) Stable
MCP stdio server (agenda-intelligence-mcp) Stable
MCP tool functions (validate / read / score / audit_claims) Stable
Evidence-audit schema (claim-level) Stable
Signal-tracker schema (lifecycle) Stable
Live source retrieval Not implemented
Heuristic benchmark baseline (4 bundled cases) Produced — mean 86.8/100
Human-judge benchmark results Not produced yet
Factual-truth verification Not in scope today

Limitations

  • No factual verification. The toolkit checks form, not truth.
  • No live source retrieval. Evidence packs are user- or agent-supplied.
  • Scoring is heuristic. The rubric is documented; an LLM-judge prompt is provided; results are not benchmarked yet.
  • Lens coverage is intentionally narrow.

Contributing eval cases

The most valuable contribution is a case: a real public event with a baseline agent output, a target brief, and a human checklist. See CONTRIBUTING.md and evals/cases/.


Repository layout

agenda-intelligence-md/
├─ src/agenda_intelligence/   # Python package (CLI + MCP server + tools)
├─ schemas/                   # JSON schemas
├─ examples/                  # briefs, evidence packs, before/after
├─ analysis-bank/             # reusable reasoning patterns (memory cards)
├─ evals/                     # rubric, judge prompt, checklist, cases
├─ docs/                      # guides, integrations, use-cases
├─ skills/agenda-intelligence/# OpenClaw skill wrapper
└─ tests/                     # pytest suite

Documentation

Resource Link
Quickstart docs/quickstart.md
End-to-end tutorial docs/tutorial.md
Evaluation docs/evaluation.md
Evidence audit docs/evidence-audit.md
Agent integration sketch docs/integrations/agent-loop.md
Use-cases docs/use-cases/
Integrations docs/integrations/
Roadmap ROADMAP.md
Changelog CHANGELOG.md

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenda_intelligence_md-0.7.3.tar.gz (120.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agenda_intelligence_md-0.7.3-py3-none-any.whl (71.4 kB view details)

Uploaded Python 3

File details

Details for the file agenda_intelligence_md-0.7.3.tar.gz.

File metadata

  • Download URL: agenda_intelligence_md-0.7.3.tar.gz
  • Upload date:
  • Size: 120.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agenda_intelligence_md-0.7.3.tar.gz
Algorithm Hash digest
SHA256 463f13a37c07385b46df926378c412609cdf31cfd38e2285782de551c5f92af9
MD5 6fb02dde2ef0ace47d907184b5234f2c
BLAKE2b-256 e7db8fb839be586dd9ad9ea2d427284f866355f0b63f8d13aac61e93c6d54aef

See more details on using hashes here.

File details

Details for the file agenda_intelligence_md-0.7.3-py3-none-any.whl.

File metadata

File hashes

Hashes for agenda_intelligence_md-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 60d7439fa5f38490f20000a7b61ee027ab4437f2aabb640ae7b9603bdd848bfe
MD5 5abc7ea38bff1825a9fc7093df7c0fa7
BLAKE2b-256 35b384ec1f27cd0bc9272372ab2d4cc881f6c036ffacb1a1c6b0591d221ffdcb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page