Skip to main content

A drop-in markdown cognition layer for AI agents that need to analyze public agenda

Project description

Agenda Intelligence MD

Evidence & eval layer for strategic intelligence agents.

PyPI version License: MIT

A protocol, JSON-schema set, CLI, and MCP-compatible toolkit that helps AI agents move from unsupported summaries to auditable strategic-risk briefs:

  • what changed
  • why it matters
  • what is evidence-backed
  • what is uncertain
  • who gains or loses leverage
  • what scenarios are plausible
  • what to watch next

It is built for engineers shipping policy, sanctions, regulation, geopolitical-risk, market-risk, and strategic-intelligence agents — where the output has to survive review by an analyst, not just sound plausible.

Bundled-example baseline (3 cases, reproduced with python evals/run_benchmark.py):

metric value
mean score 87.7 / 100
schema-valid 100%
with evidence pack 100%
with claim-level audit 100%
orphan evidence refs 0

What this is

  • Markdown protocol (Agenda-Intelligence.md) — a structured reasoning workflow agents can follow.
  • JSON schemas — validate brief structure, evidence packs, memory cards, lens manifests.
  • CLI checksvalidate-brief, validate-evidence, score, doctor for CI-style validation of agent output.
  • MCP server — a real stdio MCP server (agenda-intelligence-mcp) exposing the validation, read, and scoring tools.
  • Eval starter kit — rubric, LLM-judge prompt, human checklist, sample cases, benchmark seed.
  • Source / evidence policy — explicit rules for claim-level discipline.
  • Regional & sector lenses — compact reference packs inside the protocol (Central Asia & Caspian, Middle East, EU; sanctions, export controls). For deep regional analysis, see the dedicated vertical specialist skill.

What this is not

  • Not a factuality verifier. It does not check whether claims are true. It checks whether they are structurally sound, evidence-labeled, and decision-shaped.
  • Not an autonomous news agent. It does not crawl, retrieve, or rank sources by itself.
  • Not a source retriever. Live retrieval is not implemented.
  • Not a replacement for analyst judgment. Pass/fail signals tell you form, not substance.
  • Not a guarantee of correctness. It surfaces missing evidence and uncertainty hooks; it does not guarantee them.
  • Not a mature benchmark suite yet. The benchmark seed in evals/benchmark_set.json is a starting point, not validated results.

60-second quickstart

# From PyPI
pip install agenda-intelligence-md
# Or pinned wheel:
# pip install https://github.com/vassiliylakhonin/agenda-intelligence-md/releases/download/v0.7.2/agenda_intelligence_md-0.7.2-py3-none-any.whl

# 1. Get a source plan for a domain
agenda-intelligence start technology-ai

# 2. Validate an agent-produced brief against the schema
agenda-intelligence validate-brief examples/agenda-brief.json

# 3. Score the brief (heuristic 0-100 structural rubric)
agenda-intelligence score examples/agenda-brief.json

# 4. Score with evidence-linked feedback
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json

# 5. Run the structural bench across all bundled examples
agenda-intelligence bench examples/source-backed --strict --min-score 80

# 6. Diagnose local install + MCP tool surface
agenda-intelligence doctor

# 7. Print local MCP client config
agenda-intelligence mcp-config --client cursor

Expected scoring output:

score: 90/100
note: Heuristic structural/evidence-discipline score; does not verify factual truthfulness.
evidence_support: ... claims supported: 1/1 supported ...

Flagship example: EU AI Act

A weak baseline summary vs. an Agenda-Intelligence-MD brief, plus the evidence pack used to back each claim.

The evidence URLs in flagship examples are illustrative placeholders. The point is the shape of evidence-backed reasoning, not live citations.

Run the full pipeline on this example:

agenda-intelligence validate-brief examples/source-backed/eu-ai-act.brief.json
agenda-intelligence validate-evidence examples/source-backed/eu-ai-act.evidence.json
agenda-intelligence audit-claims examples/source-backed/eu-ai-act.audit.json --strict
agenda-intelligence score examples/source-backed/eu-ai-act.brief.json --evidence examples/source-backed/eu-ai-act.evidence.json --min-score 80

Before / after (sketch)

Baseline LLM Agenda-Intelligence-MD
Output shape Free-text summary Schema-valid brief
Claims Implicit Explicit, classified
Evidence Mixed in / absent Separate evidence pack
Uncertainty Often missing Required field
Watch-next Often missing Required, ≥1 indicator
Schema validation N/A validate-brief pass/fail
Evidence audit N/A validate-evidence pass/fail
Heuristic score N/A score 0–100

CLI

agenda-intelligence start <category>            # source plan + brief template
agenda-intelligence validate-brief <brief.json>
agenda-intelligence validate-evidence <pack.json>
agenda-intelligence audit-claims <claims.json> [--format json] [--strict]   # experimental, evidence-audit schema
agenda-intelligence score <brief.json> [--evidence <pack.json>] [--format json] [--min-score N]
agenda-intelligence score <before-after.md>
agenda-intelligence bench <dir>                  # validate + audit + score across a case directory
agenda-intelligence verify-quotes <pack.json>    # experimental, local-text mode
agenda-intelligence source-plan <category>
agenda-intelligence list-lenses [--type ...]
agenda-intelligence get-lens <type> <id>
agenda-intelligence get-protocol <name>
agenda-intelligence validate-manifest
agenda-intelligence memory-search <query>
agenda-intelligence mcp-config [--client cursor|codex|claude-desktop]
agenda-intelligence doctor [--json]
agenda-intelligence --version

MCP

The package ships a real stdio MCP server, agenda-intelligence-mcp, plus small Python tool functions in agenda_intelligence.mcp_server. See MCP.md and docs/integrations/mcp.md.

Implemented MCP tools (all verified by scripts/smoke_mcp.py):

  • validate_brief(brief_json) — schema check
  • validate_evidence(evidence_json) — schema check
  • audit_claims(audit_json) — claim-level evidence audit (experimental)
  • get_protocol(name) — return packaged protocol markdown
  • list_lenses(lens_type=None) — read from manifest
  • get_lens(lens_type, lens_id) — return packaged lens markdown
  • source_plan(category) — return source requirements
  • score_output(before_text, after_text) — heuristic structure / decision-readiness score

MCP verification status: wire-protocol verified — scripts/smoke_mcp.py exercises the full JSON-RPC cycle (initialize → tools/list → tools/call) against the running stdio server. See MCP.md.

Live source retrieval is not implemented.

Example agent flow

  1. Agent receives a policy/risk update.
  2. Agent calls source_plan for the relevant category.
  3. Agent drafts a brief in the protocol shape.
  4. Agent calls validate_brief and validate_evidence.
  5. Agent calls score_output for a decision-readiness signal.
  6. Agent returns the brief, with explicit uncertainty and watch-next.

CI / checking concept

validate-brief and validate-evidence behave like linters: zero exit on success, non-zero on failure, errors on stderr. Drop them into any CI pipeline that produces strategic briefs from agents:

agenda-intelligence validate-brief examples/agenda-brief.json
agenda-intelligence validate-evidence examples/source/evidence-pack.json
agenda-intelligence score examples/agenda-brief.json --evidence examples/source/evidence-pack.json --min-score 70

Architecture

flowchart LR
  Agent[Strategic-intelligence agent] -->|drafts| Brief[Agenda brief JSON]
  Agent -->|cites| Evidence[Evidence pack JSON]
  Brief --> Check[validate-brief]
  Evidence --> Audit[validate-evidence]
  Brief --> Score[score]
  Evidence --> Score
  P[Agenda-Intelligence.md] -.guides.-> Agent
  L[regional/sector lenses] -.guides.-> Agent
  S[source requirements] -.guides.-> Agent

Schemas

Schema Purpose
agenda-brief.schema.json Brief structure
evidence-pack.schema.json Evidence pack structure
signal-classification.schema.json Signal taxonomy
memory-card.schema.json AnalysisBank cards
lens-manifest.schema.json Lens manifest
evidence-audit.schema.json Experimental — claim-level evidence audit

Evidence audit (experimental)

Each important claim should be traceable:

{
  "claim_id": "c1",
  "claim": "EU AI Act tightens obligations on high-risk systems.",
  "claim_type": "regulatory_change",
  "evidence_ids": ["e1", "e2"],
  "support_level": "direct",
  "uncertainty": "Enforcement timeline per sector unclear.",
  "risk_if_wrong": "Compliance plans miss deadline."
}

support_level is one of direct | partial | weak | unsupported. This schema is experimental; not yet wired into validate-evidence by default.


Evals

See docs/evaluation.md for the full layer breakdown.

Key honesty rule:

Current scoring does not verify factual truth. It evaluates structure, completeness, evidence labeling, and decision-readiness signals.

Bundled-example baseline: mean 87.7/100, 3 cases, 100% schema-valid, 0 orphan refs. Reproduce with python evals/run_benchmark.py. Human-judge benchmarking is not done yet.


Status

Component Status
Markdown protocol Stable
JSON schemas (brief, evidence, lens, memory, signal) Stable
CLI: validate-*, score, start, source-plan, doctor, mcp-config Stable
Lenses (Central Asia, Middle East, EU; sanctions, export controls) Stable
MCP stdio server (agenda-intelligence-mcp) Stable
MCP tool functions (validate / read / score / audit_claims) Stable
Evidence-audit schema (claim-level) Experimental
Live source retrieval Not implemented
Heuristic benchmark baseline (3 bundled cases) Produced — mean 87.7/100
Human-judge benchmark results Not produced yet
Factual-truth verification Not in scope today

Limitations

  • No factual verification. The toolkit checks form, not truth.
  • No live source retrieval. Evidence packs are user- or agent-supplied.
  • Scoring is heuristic. The rubric is documented; an LLM-judge prompt is provided; results are not benchmarked yet.
  • Lens coverage is intentionally narrow.

Contributing eval cases

The most valuable contribution is a case: a real public event with a baseline agent output, a target brief, and a human checklist. See CONTRIBUTING.md and evals/cases/.


Repository layout

agenda-intelligence-md/
├─ src/agenda_intelligence/   # Python package (CLI + MCP server + tools)
├─ schemas/                   # JSON schemas
├─ examples/                  # briefs, evidence packs, before/after
├─ analysis-bank/             # reusable reasoning patterns (memory cards)
├─ evals/                     # rubric, judge prompt, checklist, cases
├─ docs/                      # guides, integrations, use-cases
├─ skills/agenda-intelligence/# OpenClaw skill wrapper
└─ tests/                     # pytest suite

Documentation

Resource Link
Quickstart docs/quickstart.md
End-to-end tutorial docs/tutorial.md
Evaluation docs/evaluation.md
Evidence audit (experimental) docs/evidence-audit.md
Agent integration sketch docs/integrations/agent-loop.md
Use-cases docs/use-cases/
Integrations docs/integrations/
Roadmap ROADMAP.md
Changelog CHANGELOG.md

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenda_intelligence_md-0.7.2.tar.gz (106.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agenda_intelligence_md-0.7.2-py3-none-any.whl (63.9 kB view details)

Uploaded Python 3

File details

Details for the file agenda_intelligence_md-0.7.2.tar.gz.

File metadata

  • Download URL: agenda_intelligence_md-0.7.2.tar.gz
  • Upload date:
  • Size: 106.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agenda_intelligence_md-0.7.2.tar.gz
Algorithm Hash digest
SHA256 3765c51972e7182e1ca8ec8ff79931bf176f6a51b2135826aed15f4683353ecf
MD5 30db5c8f89b11f52adcef8d84a7c53ee
BLAKE2b-256 83cf64a6869ac4fd51e405d06a127b29a159365186b866fb63955787b2b56a33

See more details on using hashes here.

File details

Details for the file agenda_intelligence_md-0.7.2-py3-none-any.whl.

File metadata

File hashes

Hashes for agenda_intelligence_md-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b1551d5699687e26efd5d05f078ccf4be2aaee7e72d422f1d98b05ca082bc390
MD5 d89ab4185007860d9c750ca0574ac404
BLAKE2b-256 01621b06fccf38f2aa826c51af5979126ea35639b18e190e058b728e07c64405

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page