Skip to main content

Academic figure agent harness for multi-step planning, generation, and evaluation through MCP

Project description

๐ŸŽจ Academic Figures MCP

PyPI version VS Code Marketplace CI License Python 3.10+

A multi-step academic figure agent harness for AI agents and non-engineers.

Academic Figures MCP is a workflow harness for multi-step academic reasoning and figure production. PMID ingestion is one structured entry point, but the real product value is helping an agent move through academic planning, concept decomposition, figure-type selection, prompt orchestration, image generation, evaluation, and iteration until it reaches a publication-grade result. Preprints, repositories, and freeform briefs are also first-class planning inputs. MCP exposure and VSX packaging make that workflow usable without requiring engineering-heavy setup.

One-Click Install (VS Code)

Requires uv. The install shape uses uvx --from academic-figures-mcp afm-server, which is shell-neutral across macOS, Linux, and Windows.

Install in VS Code Install in VS Code Insiders

If you want guided setup instead of raw MCP configuration, install the VS Code extension. It supports SecretStorage, env files, and process environment configuration on macOS, Linux, and Windows.

Introduction Visual

Academic Figures MCP introduction hero

This hero visual is self-generated by the repository's own MCP workflow and is placed here intentionally so new visitors can see the product story immediately.

Why This Exists

Generating academic figures normally requires:

  1. Manual prompt engineering โœ๏ธ
  2. Journal standard research ๐Ÿ“š
  3. Color code lookup ๐ŸŽจ
  4. Quality self-review โœ…
  5. Retry loops ๐Ÿ”„

This harness automates those steps and exposes them through MCP so agents can work through the academic reasoning process in an orderly way. The image API is supporting infrastructure; the product value is the structured workflow that helps an agent plan and produce academic-grade figures.

It now includes a YAML-backed journal registry so the MCP layer can inject figure requirements for targets such as Nature, Science, JAMA, NEJM, and Lancet without forcing the agent to memorize house rules.

MCP Surface

This server targets the modern MCP Python SDK line and is intended to expose:

  • 13 MCP tools for planning, generation, editing, evaluation, review write-back, manifest inspection, replay, retargeting, verification, and multi-step refinement workflows
  • resources for discovery of presets, templates, and Gemini image defaults
  • reusable prompts for figure planning and style transformation

Harness Flow

The system is designed as a multi-step academic workflow:

  1. Start from a structured source such as a PMID, an academic objective, or a figure revision request.
  2. Reason about the scientific concept, communication goal, and target figure type.
  3. Organize the request into a structured plan using academic constraints and journal conventions.
  4. Generate the figure through the provider layer.
  5. Run provider-side automated review and persist the manifest.
  6. Optionally add host-side visual review back into the same manifest.
  7. Iterate until the output is publication-grade.

MCP Tools

Tool Input Output
plan_figure pmid or source_title, plus source_kind?, source_summary?, source_identifier?, output_format?, figure_type?, style_preset? Structured plan with route, constraints, and next-step arguments
generate_figure planned_payload or a direct source input (pmid / source_title), plus output_format? Single high-level draw entrypoint with optional internal planning and raster format conversion
edit_figure image_path, feedback, output_format? Refined image via Gemini edit API with optional internal raster format conversion
evaluate_figure image_path, figure_type? 8-domain scorecard with suggestions
batch_generate pmids: list, figure_type? Batch generation results
composite_figure panels, labels, title, caption?, citation? Publication-ready multi-panel montage with labels and DPI metadata
list_manifests limit? Recent manifest metadata for replay or retargeting
get_manifest_detail manifest_id, include_lineage? Full manifest payload, lineage chain, and flattened review timeline
replay_manifest manifest_id, output_dir? Re-run a saved manifest using the original prompt and provider
record_host_review manifest_id, passed, summary, critical_issues?, reviewer? Persist a Copilot or host-side visual verdict back into the review harness
retarget_journal manifest_id, target_journal, output_dir? Regenerate with a new journal profile plus before/after diff
verify_figure image_path, expected_labels?, figure_type?, language? Standalone quality-gate verdict with domain scores and exact-label verification
multi_turn_edit image_path, instructions[], max_turns? Iterative edit session for progressive refinement without restarting from scratch

generate_figure is now the default high-level entrypoint. You can pass a PMID, a non-PMID source brief such as a preprint or repository summary, or a fully prepared planned_payload. When the request starts from source inputs, the server plans internally first and then renders. plan_figure remains available when a host explicitly wants to inspect or edit the plan before drawing.

If you want a specific delivered file type, pass output_format such as png, gif, jpeg, or webp. MCP now performs raster-format conversion internally after generation or editing. SVG stays pass-through only; it is not rasterized automatically.

Reproducibility & Retargeting

  • Every successful generation now writes a manifest to .academic-figures/manifests (override with AFM_MANIFEST_DIR).
  • list_manifests remains the summary surface; get_manifest_detail loads the full manifest body plus lineage-aware review history.
  • list_manifests + replay_manifest let you rerun saved prompts without rebuilding the plan.
  • retarget_journal injects a new journal profile, regenerates, and returns a before/after diff of the profile metadata.

Multi-Panel & Composite Assembly

  • planned_payload now accepts render_route=composite_figure with a panels list to assemble montage figures.
  • The built-in composite_figure tool remains available for direct multi-panel assembly with labels, caption, and DPI metadata.

CJK Text Fidelity & Self-Review

  • plan_figure now accepts expected_labels so exact text strings can be propagated into prompt construction and later verification.
  • Text-heavy CJK requests can be escalated toward higher-fidelity model selection and SVG-oriented routes automatically.
  • generate_figure, replay_manifest, and retarget_journal automatically run provider-side review when a verifier is configured, and they persist quality_gate, review_summary, and review_history into the manifest.
  • verify_figure lets you run the same provider-side quality gate independently against any generated image.
  • record_host_review is the write-back path for Copilot or another host model after it inspects the image externally.
  • The dual-route review contract accepts provider_vision and host_vision, but the persisted policy is now provider_vision_required_host_optional.
  • provider_vision is the baseline gate: it must run and pass for requirement_met=true.
  • host_vision is supplemental: it can add a second pass or extra critique, but it cannot replace a failed or missing provider baseline.
  • get_manifest_detail returns the full review history so a host can inspect the current verdict, all recorded review events, and lineage context without rereading raw JSON files.
  • multi_turn_edit keeps an edit session alive across multiple instructions, which is useful when fixing garbled labels or layout issues iteratively.

Product Positioning

The core differentiator is not simply "connected to an image model API".

  • It is a complete academic-figure agent harness.
  • It helps agents reason through academic concepts before they generate.
  • It is exposed through MCP so multiple AI hosts can drive the same workflow.
  • It is packaged as a VSX experience so non-engineers can adopt it quickly.
  • Provider integrations such as Google Gemini or OpenRouter are replaceable infrastructure behind that harness.

Competitive Landscape

The current GitHub- and web-based benchmark is documented in docs/competitive-landscape.md.

That document separates:

  • direct competitors
  • adjacent reusable wheels
  • strengths worth absorbing
  • core product differences we should not copy away

Project Documents

Key repo-level documents:

Generated Visuals & QA

The following three visuals were generated by this repository's own MCP workflow and then reviewed through the built-in evaluate_figure path.

This section is intentionally self-hosting: each image below was generated from the payload files under .academic-figures/jobs, and each QA report was produced by this same repository through scripts/start_afm_local.py run evaluate against the generated output image. These are not manually drawn marketing assets or hand-written review notes.

Introduction Visual QA

QA summary:

  • Score: 5.0/5.0
  • Strengths: clear story from academic input to MCP workflow hub to publication-grade outputs
  • Critical issues: none identified
  • Full report: repo-intro-hero-eval.json

Architecture Visual

Academic Figures MCP architecture v2

QA summary:

  • Score: 5/5
  • Strengths: explicit DDD layering, clear Presentation -> Application -> Domain <- Infrastructure direction, and repo-specific integration edges
  • Critical issues: none identified
  • Full report: repo-architecture-v2-eval.json

Workflow Visual

Academic Figures MCP workflow flowchart

QA summary:

  • Score: 4.6/5
  • Strengths: one clean main path, strong readability, high visual polish, and the duplicate PAYLOAD error is removed in v2
  • Critical issues: no formal citation or source attribution is shown inside the figure
  • Full report: repo-workflow-flowchart-eval.json

Quick Install

git clone https://github.com/u9401066/academic-figures-mcp.git
cd academic-figures-mcp
uv sync
# then copy env.example to env and fill one provider key,
# or provide GOOGLE_API_KEY / OPENROUTER_API_KEY through your shell or MCP host config

Local Env File

For local runs and smoke tests, copy env.example to env and fill exactly one provider section.

Supported formats:

  • KEY=value
  • export KEY=value
  • set KEY=value

Provider examples:

  • AFM_IMAGE_PROVIDER=google with GOOGLE_API_KEY
  • AFM_IMAGE_PROVIDER=openrouter with OPENROUTER_API_KEY
  • AFM_IMAGE_PROVIDER=ollama with OLLAMA_BASE_URL and OLLAMA_MODEL
  • AFM_MANIFEST_DIR=.academic-figures/manifests to relocate persisted generation manifests

Smoke Test

You can run a sanitized end-to-end smoke test with:

uv run python scripts/env_smoke_test.py env

The script only prints variable presence and a compact result summary. It never prints API key values.

Usage

VS Code Copilot

Recommended package-mode install for macOS, Linux, and Windows users who do not want a local checkout:

{
  "servers": {
    "academicFigures": {
      "type": "stdio",
      "command": "uvx",
      "args": [
        "--from",
        "academic-figures-mcp",
        "afm-server"
      ],
      "env": {
        "AFM_IMAGE_PROVIDER": "google",
        "GOOGLE_API_KEY": "${input:googleApiKey}"
      }
    }
  }
}

For local repository development, add to your Copilot MCP settings (.vscode/mcp.json):

{
  "servers": {
    "academicFigures": {
      "type": "stdio",
      "envFile": "${workspaceFolder}/env",
      "command": "uv",
      "args": [
        "run",
        "--project",
        "${workspaceFolder}",
        "python",
        "-m",
        "src.presentation.server"
      ]
    }
  }
}

This launch shape is shell-neutral and works across Windows, macOS, and Linux as long as uv is installed. It also keeps the project root explicit through --project ${workspaceFolder} while loading secrets from the repo-root env file via envFile.

Manual Local Startup

Cross-platform launcher:

uv run python scripts/start_afm_local.py server

Run the first figure directly through afm-run:

uv run python scripts/start_afm_local.py run generate --pmid 41657234 --language zh-TW --output-size 1024x1536
uv run python scripts/start_afm_local.py run generate --pmid 41657234 --output-format webp

This direct --pmid path is a compatibility bridge. It now performs the planning step internally before rendering.

Inject a journal profile explicitly when you want the planner and renderer to enforce a house style:

uv run python scripts/start_afm_local.py run plan --pmid 41657234 --target-journal Nature
uv run python scripts/start_afm_local.py run generate --pmid 41657234 --target-journal JAMA

Plan directly from a non-PMID source such as a repository or preprint brief:

uv run python scripts/start_afm_local.py run plan --source-title "HyperHierarchicalRAG repository overview" --source-kind repo --source-identifier github.com/zzstoatzz/hyperhierarchicalrag --source-summary "Explain the repository architecture, agent workflow, and retrieval hierarchy."

Run generic asset generation through the same public tool using a JSON payload file:

uv run python scripts/start_afm_local.py run generate --payload-file .academic-figures/jobs/icon-request.json --output-dir .academic-figures/outputs

The same wrapper also supports direct planning and evaluation:

uv run python scripts/start_afm_local.py run plan --pmid 41657234
uv run python scripts/start_afm_local.py run plan --pmid 41657234 --output-format jpeg
uv run python scripts/start_afm_local.py run plan --source-title "Acute stroke treatment preprint" --source-kind preprint --source-identifier arXiv:2504.01234 --source-summary "Summarize the proposed intervention workflow and outcome framing."
uv run python scripts/start_afm_local.py run evaluate --image-path .academic-figures/outputs/your-file.png

For exact-label generation and post-generation QA on text-heavy figures:

uv run python scripts/start_afm_local.py run plan --pmid 41657234 --language zh-TW --expected-label "่…ฆไธญ้ขจ" --expected-label "่ก€ๆ “็งป้™ค่ก“"
uv run python scripts/start_afm_local.py run verify --image-path .academic-figures/outputs/your-file.png --language zh-TW --expected-label "่…ฆไธญ้ขจ"

Windows PowerShell shortcut:

powershell -NoProfile -ExecutionPolicy Bypass -File scripts/start_afm_local.ps1 server

Then just ask:

  • "Generate a flowchart for PMID 41657234"
  • "Help me plan the right academic figure structure for PMID 41657234 before generating it"
  • "ๅนซๆˆ‘ๅš PMID 41657234 ็š„ consensus flowchart"
  • "What figure type should I use for PMID 34567890?"
  • "Plan a repository overview figure for https://github.com/zzstoatzz/HyperHierarchicalRAG"
  • "Turn this preprint abstract into a publication-grade mechanism figure plan"
  • "Help me turn this academic concept into a publication-grade figure plan"

The VS Code extension can now run plan, generate, transform, and evaluate commands directly through afm-run instead of copying prompts into chat.

Claude Code / Cursor / Any MCP Host

Any MCP-compatible agent can use these tools directly.

Recommended package-mode shape for Claude Desktop or any MCP host that accepts command plus args:

{
  "mcpServers": {
    "academic-figures": {
      "command": "uvx",
      "args": [
        "--from",
        "academic-figures-mcp",
        "afm-server"
      ],
      "env": {
        "AFM_IMAGE_PROVIDER": "google",
        "GOOGLE_API_KEY": "your_google_api_key"
      }
    }
  }
}

If your MCP host prefers a checked-out repository instead of uvx, keep the repo path absolute and use the existing uv run --project /absolute/path/to/academic-figures-mcp python -m src.presentation.server form.

For local development with the newer MCP SDK transport options, the server defaults to stdio, and can also be started with MCP_TRANSPORT=streamable-http for HTTP-based inspection workflows.

Cross-Platform Notes

  • Package mode is the most portable install path: uvx --from academic-figures-mcp afm-server works without shell-specific quoting on macOS, Linux, and Windows.
  • Local checkout mode is also cross-platform: use scripts/start_afm_local.py on macOS/Linux and scripts/start_afm_local.ps1 on Windows PowerShell.
  • Environment parsing already accepts KEY=value, export KEY=value, and set KEY=value, so the same env profile can be reused across Bash, Zsh, Fish-style exports, and PowerShell/CMD-oriented files.
  • The VS Code extension falls back to package mode through uvx when no local source tree is detected, which is the safest route for non-developer users on all three platforms.

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Your AI Agent       โ”‚     VS Code Copilot, Claude Code,
โ”‚  (Copilot, Claude,   โ”‚     OpenClaw, Hermes, etc.
โ”‚   any MCP host)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚  MCP stdio / streamable-http
           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Academic Figures MCP    โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ plan_figure        โ”‚  โ”‚
โ”‚  โ”‚ generate_figure    โ”‚  โ”‚
โ”‚  โ”‚ edit_figure        โ”‚  โ”‚  5 Tools
โ”‚  โ”‚ evaluate_figure    โ”‚  โ”‚
โ”‚  โ”‚ batch_generate     โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚           โ”‚               โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ Core Orchestrator    โ”‚ โ”‚
โ”‚  โ”‚                      โ”‚ โ”‚
โ”‚  โ”‚ 1. fetch_paper()     โ”‚ โ”‚  โ†’ PubMed E-utilities
โ”‚  โ”‚ 2. classify_type()   โ”‚ โ”‚  โ†’ Keyword + structured planning heuristics
โ”‚  โ”‚ 3. build_payload()   โ”‚ โ”‚  โ†’ reusable render request / prompt pack
โ”‚  โ”‚ 4. generate_image()  โ”‚ โ”‚  โ†’ single public renderer (Google / OpenRouter / Ollama SVG)
โ”‚  โ”‚ 5. evaluate()        โ”‚ โ”‚  โ†’ 8-domain vision scoring or local critique
โ”‚  โ”‚ 6. iterate()         โ”‚ โ”‚  โ†’ harness-guided revision loop
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Figure Types & Auto-Classification

The MCP auto-classifies papers into optimal figure types:

Type Best For Example Papers
Flowchart Consensus, guidelines "SSC 2026 Sepsis Guidelines"
Mechanism Drug mechanisms, pathways "Sugammadex encapsulation mechanism"
Comparison RCTs, meta-analyses "Crystalloid vs Colloid fluid resuscitation"
Infographic Reviews, overviews "Perioperative fasting consensus"
Timeline Historical, longitudinal "Evolution of general anesthesia"
Anatomical Surgical techniques, blocks "Regional anesthesia approaches"
Data Visual PK/PD, dose-response "Propofol PK modeling"

Knowledge Base (Included)

This repo ships with 9 curated reference assets:

File Content
prompt-templates.md 7-block prompt templates for 9 figure types
anatomy-color-standards.md Medical illustration color coding reference
journal-figure-standards.md Nature/Lancet formatting requirements
journal-profiles.yaml Machine-readable journal registry for automatic prompt injection
gemini-tips.md Gemini 3.1 Flash prompt engineering best practices
model-benchmark.md NB2 vs GPT Image 1.5 comparison data
code-rendering.md matplotlib/Python figure generation reference
scientific-figures-guide.md Scientific figure design principles
ai-medical-illustration-evaluation.md 8-domain evaluation rubric

Planned Rendering Ecosystem

This project is no longer framed as a single-route Gemini prompt server. The current design direction is a multi-route figure system:

  • Matplotlib + SciencePlots for deterministic, publication-style charts
  • D2 + Mermaid for structured diagrams and editable text-first figure specs
  • FigureFirst + CairoSVG for precise multi-panel assembly and export
  • Excalidraw or tldraw as future interactive vector-editing layers inside the VS Code extension
  • Kroki as an optional self-hosted render gateway for compatibility with multiple DSL engines

Development

uv sync
uv run python -m src.presentation.server

The planned Gemini image integration follows the current Google Gen AI SDK pattern:

from google import genai
from google.genai import types

License

Apache License 2.0. See LICENSE.

Composite Engine (Multi-Panel Layout)

The composite module solves Gemini's weakness with multi-panel figures. Instead of generating a single image with all panels (which often fails on spatial layout, numbering, and mixed styles), it:

  1. Generates each panel independently with focused prompts
  2. Composites them using Pillow with precise pixel-level layout
  3. Programmatic text overlay โ€” 100% accurate labels, no misspellings

Composite Usage

from src.infrastructure.composite import CompositeFigure, PanelSpec
from src.server import generate_figure

# Step 1: Generate panels separately
left = generate_figure(pmid="41657234", figure_type="anatomy")
right = generate_figure(pmid="41657234", figure_type="ultrasound")

# Step 2: Composite
comp = CompositeFigure()
comp.add_panel(
    PanelSpec(prompt="...", label="A", panel_type="anatomy"),
    left["image_path"]
)
comp.add_panel(
    PanelSpec(prompt="...", label="B", panel_type="ultrasound"),
    right["image_path"]
)
comp.set_title("Interscalene Brachial Plexus Block")
comp.set_citation("PMID 41657234 ยท Regional Anesthesia")
comp.compose("interscalene_block.pdf")

MCP Tool: composite_figure

composite_figure(
    panels=[["left.png", "anatomy"], ["right.png", "ultrasound"]],
    labels=["A", "B"],
    title="..."
)

Layout Specs

Property Value
Canvas 2400 ร— 1600 px (8" ร— 5.33" @ 300 DPI)
Format Double column (~183mm width, Nature standard)
Labels A/B/C with pill-shaped background
Footer Caption + PMIDs + citation
Divider Vertical line between panels

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

academic_figures_mcp-0.4.4.tar.gz (71.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

academic_figures_mcp-0.4.4-py3-none-any.whl (88.5 kB view details)

Uploaded Python 3

File details

Details for the file academic_figures_mcp-0.4.4.tar.gz.

File metadata

  • Download URL: academic_figures_mcp-0.4.4.tar.gz
  • Upload date:
  • Size: 71.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for academic_figures_mcp-0.4.4.tar.gz
Algorithm Hash digest
SHA256 08d1cdca2d9653eeefd72e98aeb6e57ea78244efc647cc3d1237f3f3e932b7ec
MD5 22c3cef3edb5be267161cbcace44bd69
BLAKE2b-256 2bb38926e40d66ec5ab81241a13fe185b81294bb6cb422cad651e97691d3a114

See more details on using hashes here.

Provenance

The following attestation bundles were made for academic_figures_mcp-0.4.4.tar.gz:

Publisher: publish.yml on u9401066/academic-figures-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file academic_figures_mcp-0.4.4-py3-none-any.whl.

File metadata

File hashes

Hashes for academic_figures_mcp-0.4.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6b7979e3a3eb63c42e8e2d54d686b23640025a008740daaa8e8aa1dd0b6dfae8
MD5 b917677a1a89088392ba1eb0b9f8d419
BLAKE2b-256 556c8b2bc12adc92640a2be4358277b3c53b87a0a489d3f40d887835b6614255

See more details on using hashes here.

Provenance

The following attestation bundles were made for academic_figures_mcp-0.4.4-py3-none-any.whl:

Publisher: publish.yml on u9401066/academic-figures-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page