Academic figure agent harness for multi-step planning, generation, and evaluation through MCP
Project description
๐จ Academic Figures MCP
A multi-step academic figure agent harness for AI agents and non-engineers.
Academic Figures MCP is a workflow harness for multi-step academic reasoning and figure production. PMID ingestion is one structured entry point, but the real product value is helping an agent move through academic planning, concept decomposition, figure-type selection, prompt orchestration, image generation, evaluation, and iteration until it reaches a publication-grade result. Preprints, repositories, and freeform briefs are also first-class planning inputs. MCP exposure and VSX packaging make that workflow usable without requiring engineering-heavy setup.
One-Click Install (VS Code)
Requires uv. The install shape uses
uvx --from academic-figures-mcp afm-server, which is shell-neutral across macOS, Linux, and Windows.
If you want guided setup instead of raw MCP configuration, install the VS Code extension. It supports SecretStorage, env files, and process environment configuration on macOS, Linux, and Windows.
Introduction Visual
This hero visual is self-generated by the repository's own MCP workflow and is placed here intentionally so new visitors can see the product story immediately.
Why This Exists
Generating academic figures normally requires:
- Manual prompt engineering โ๏ธ
- Journal standard research ๐
- Color code lookup ๐จ
- Quality self-review โ
- Retry loops ๐
This harness automates those steps and exposes them through MCP so agents can work through the academic reasoning process in an orderly way. The image API is supporting infrastructure; the product value is the structured workflow that helps an agent plan and produce academic-grade figures.
It now includes a YAML-backed journal registry so the MCP layer can inject figure requirements for targets such as Nature, Science, JAMA, NEJM, and Lancet without forcing the agent to memorize house rules.
MCP Surface
This server targets the modern MCP Python SDK line and is intended to expose:
- 13 MCP tools for planning, generation, editing, evaluation, review write-back, manifest inspection, replay, retargeting, verification, and multi-step refinement workflows
- resources for discovery of presets, templates, and Gemini image defaults
- reusable prompts for figure planning and style transformation
Harness Flow
The system is designed as a multi-step academic workflow:
- Start from a structured source such as a PMID, an academic objective, or a figure revision request.
- Reason about the scientific concept, communication goal, and target figure type.
- Organize the request into a structured plan using academic constraints and journal conventions.
- Generate the figure through the provider layer.
- Run provider-side automated review and persist the manifest.
- Optionally add host-side visual review back into the same manifest.
- Iterate until the output is publication-grade.
MCP Tools
| Tool | Input | Output |
|---|---|---|
plan_figure |
pmid or source_title, plus source_kind?, source_summary?, source_identifier?, output_format?, figure_type?, style_preset? |
Structured plan with route, constraints, and next-step arguments |
generate_figure |
planned_payload or a direct source input (pmid / source_title), plus output_format? |
Single high-level draw entrypoint with optional internal planning and raster format conversion |
edit_figure |
image_path, feedback, output_format? |
Refined image via Gemini edit API with optional internal raster format conversion |
evaluate_figure |
image_path, figure_type? |
8-domain scorecard with suggestions |
batch_generate |
pmids: list, figure_type? |
Batch generation results |
composite_figure |
panels, labels, title, caption?, citation? |
Publication-ready multi-panel montage with labels and DPI metadata |
list_manifests |
limit? |
Recent manifest metadata for replay or retargeting |
get_manifest_detail |
manifest_id, include_lineage? |
Full manifest payload, lineage chain, and flattened review timeline |
replay_manifest |
manifest_id, output_dir? |
Re-run a saved manifest using the original prompt and provider |
record_host_review |
manifest_id, passed, summary, critical_issues?, reviewer? |
Persist a Copilot or host-side visual verdict back into the review harness |
retarget_journal |
manifest_id, target_journal, output_dir? |
Regenerate with a new journal profile plus before/after diff |
verify_figure |
image_path, expected_labels?, figure_type?, language? |
Standalone quality-gate verdict with domain scores and exact-label verification |
multi_turn_edit |
image_path, instructions[], max_turns? |
Iterative edit session for progressive refinement without restarting from scratch |
generate_figure is now the default high-level entrypoint. You can pass a PMID, a non-PMID source brief such as a preprint or repository summary, or a fully prepared planned_payload. When the request starts from source inputs, the server plans internally first and then renders. plan_figure remains available when a host explicitly wants to inspect or edit the plan before drawing.
If you want a specific delivered file type, pass output_format such as png, gif, jpeg, or webp. MCP now performs raster-format conversion internally after generation or editing. SVG stays pass-through only; it is not rasterized automatically.
Reproducibility & Retargeting
- Every successful generation now writes a manifest to
.academic-figures/manifests(override withAFM_MANIFEST_DIR). list_manifestsremains the summary surface;get_manifest_detailloads the full manifest body plus lineage-aware review history.list_manifests+replay_manifestlet you rerun saved prompts without rebuilding the plan.retarget_journalinjects a new journal profile, regenerates, and returns a before/after diff of the profile metadata.
Multi-Panel & Composite Assembly
planned_payloadnow acceptsrender_route=composite_figurewith apanelslist to assemble montage figures.- The built-in
composite_figuretool remains available for direct multi-panel assembly with labels, caption, and DPI metadata.
CJK Text Fidelity & Self-Review
plan_figurenow acceptsexpected_labelsso exact text strings can be propagated into prompt construction and later verification.- Text-heavy CJK requests can be escalated toward higher-fidelity model selection and SVG-oriented routes automatically.
generate_figure,replay_manifest, andretarget_journalautomatically run provider-side review when a verifier is configured, and they persistquality_gate,review_summary, andreview_historyinto the manifest.verify_figurelets you run the same provider-side quality gate independently against any generated image.record_host_reviewis the write-back path for Copilot or another host model after it inspects the image externally.- The dual-route review contract accepts
provider_visionandhost_vision, but the persisted policy is nowprovider_vision_required_host_optional. provider_visionis the baseline gate: it must run and pass forrequirement_met=true.host_visionis supplemental: it can add a second pass or extra critique, but it cannot replace a failed or missing provider baseline.get_manifest_detailreturns the full review history so a host can inspect the current verdict, all recorded review events, and lineage context without rereading raw JSON files.multi_turn_editkeeps an edit session alive across multiple instructions, which is useful when fixing garbled labels or layout issues iteratively.
Product Positioning
The core differentiator is not simply "connected to an image model API".
- It is a complete academic-figure agent harness.
- It helps agents reason through academic concepts before they generate.
- It is exposed through MCP so multiple AI hosts can drive the same workflow.
- It is packaged as a VSX experience so non-engineers can adopt it quickly.
- Provider integrations such as Google Gemini or OpenRouter are replaceable infrastructure behind that harness.
Competitive Landscape
The current GitHub- and web-based benchmark is documented in docs/competitive-landscape.md.
That document separates:
- direct competitors
- adjacent reusable wheels
- strengths worth absorbing
- core product differences we should not copy away
Project Documents
Key repo-level documents:
- ROADMAP.md for planned capabilities and sequencing
- CHANGELOG.md for notable project changes
- docs/competitive-landscape.md for market and positioning context
Generated Visuals & QA
The following three visuals were generated by this repository's own MCP workflow and then reviewed through the built-in evaluate_figure path.
This section is intentionally self-hosting: each image below was generated from the payload files under .academic-figures/jobs, and each QA report was produced by this same repository through scripts/start_afm_local.py run evaluate against the generated output image. These are not manually drawn marketing assets or hand-written review notes.
Introduction Visual QA
QA summary:
- Score:
5.0/5.0 - Strengths: clear story from academic input to MCP workflow hub to publication-grade outputs
- Critical issues: none identified
- Full report: repo-intro-hero-eval.json
Architecture Visual
QA summary:
- Score:
5/5 - Strengths: explicit DDD layering, clear
Presentation -> Application -> Domain <- Infrastructuredirection, and repo-specific integration edges - Critical issues: none identified
- Full report: repo-architecture-v2-eval.json
Workflow Visual
QA summary:
- Score:
4.6/5 - Strengths: one clean main path, strong readability, high visual polish, and the duplicate
PAYLOADerror is removed in v2 - Critical issues: no formal citation or source attribution is shown inside the figure
- Full report: repo-workflow-flowchart-eval.json
Quick Install
git clone https://github.com/u9401066/academic-figures-mcp.git
cd academic-figures-mcp
uv sync
# then copy env.example to env and fill one provider key,
# or provide GOOGLE_API_KEY / OPENROUTER_API_KEY through your shell or MCP host config
Local Env File
For local runs and smoke tests, copy env.example to env and fill exactly one provider section.
Supported formats:
KEY=valueexport KEY=valueset KEY=value
Provider examples:
AFM_IMAGE_PROVIDER=googlewithGOOGLE_API_KEYAFM_IMAGE_PROVIDER=openrouterwithOPENROUTER_API_KEYAFM_IMAGE_PROVIDER=ollamawithOLLAMA_BASE_URLandOLLAMA_MODELAFM_MANIFEST_DIR=.academic-figures/manifeststo relocate persisted generation manifests
Smoke Test
You can run a sanitized end-to-end smoke test with:
uv run python scripts/env_smoke_test.py env
The script only prints variable presence and a compact result summary. It never prints API key values.
Usage
VS Code Copilot
Recommended package-mode install for macOS, Linux, and Windows users who do not want a local checkout:
{
"servers": {
"academicFigures": {
"type": "stdio",
"command": "uvx",
"args": [
"--from",
"academic-figures-mcp",
"afm-server"
],
"env": {
"AFM_IMAGE_PROVIDER": "google",
"GOOGLE_API_KEY": "${input:googleApiKey}"
}
}
}
}
For local repository development, add to your Copilot MCP settings (.vscode/mcp.json):
{
"servers": {
"academicFigures": {
"type": "stdio",
"envFile": "${workspaceFolder}/env",
"command": "uv",
"args": [
"run",
"--project",
"${workspaceFolder}",
"python",
"-m",
"src.presentation.server"
]
}
}
}
This launch shape is shell-neutral and works across Windows, macOS, and Linux as long as uv is installed. It also keeps the project root explicit through --project ${workspaceFolder} while loading secrets from the repo-root env file via envFile.
Manual Local Startup
Cross-platform launcher:
uv run python scripts/start_afm_local.py server
Run the first figure directly through afm-run:
uv run python scripts/start_afm_local.py run generate --pmid 41657234 --language zh-TW --output-size 1024x1536
uv run python scripts/start_afm_local.py run generate --pmid 41657234 --output-format webp
This direct --pmid path is a compatibility bridge. It now performs the planning step internally before rendering.
Inject a journal profile explicitly when you want the planner and renderer to enforce a house style:
uv run python scripts/start_afm_local.py run plan --pmid 41657234 --target-journal Nature
uv run python scripts/start_afm_local.py run generate --pmid 41657234 --target-journal JAMA
Plan directly from a non-PMID source such as a repository or preprint brief:
uv run python scripts/start_afm_local.py run plan --source-title "HyperHierarchicalRAG repository overview" --source-kind repo --source-identifier github.com/zzstoatzz/hyperhierarchicalrag --source-summary "Explain the repository architecture, agent workflow, and retrieval hierarchy."
Run generic asset generation through the same public tool using a JSON payload file:
uv run python scripts/start_afm_local.py run generate --payload-file .academic-figures/jobs/icon-request.json --output-dir .academic-figures/outputs
The same wrapper also supports direct planning and evaluation:
uv run python scripts/start_afm_local.py run plan --pmid 41657234
uv run python scripts/start_afm_local.py run plan --pmid 41657234 --output-format jpeg
uv run python scripts/start_afm_local.py run plan --source-title "Acute stroke treatment preprint" --source-kind preprint --source-identifier arXiv:2504.01234 --source-summary "Summarize the proposed intervention workflow and outcome framing."
uv run python scripts/start_afm_local.py run evaluate --image-path .academic-figures/outputs/your-file.png
For exact-label generation and post-generation QA on text-heavy figures:
uv run python scripts/start_afm_local.py run plan --pmid 41657234 --language zh-TW --expected-label "่
ฆไธญ้ขจ" --expected-label "่กๆ ็งป้ค่ก"
uv run python scripts/start_afm_local.py run verify --image-path .academic-figures/outputs/your-file.png --language zh-TW --expected-label "่
ฆไธญ้ขจ"
Windows PowerShell shortcut:
powershell -NoProfile -ExecutionPolicy Bypass -File scripts/start_afm_local.ps1 server
Then just ask:
- "Generate a flowchart for PMID 41657234"
- "Help me plan the right academic figure structure for PMID 41657234 before generating it"
- "ๅนซๆๅ PMID 41657234 ็ consensus flowchart"
- "What figure type should I use for PMID 34567890?"
- "Plan a repository overview figure for https://github.com/zzstoatzz/HyperHierarchicalRAG"
- "Turn this preprint abstract into a publication-grade mechanism figure plan"
- "Help me turn this academic concept into a publication-grade figure plan"
The VS Code extension can now run plan, generate, transform, and evaluate commands directly through afm-run instead of copying prompts into chat.
Claude Code / Cursor / Any MCP Host
Any MCP-compatible agent can use these tools directly.
Recommended package-mode shape for Claude Desktop or any MCP host that accepts command plus args:
{
"mcpServers": {
"academic-figures": {
"command": "uvx",
"args": [
"--from",
"academic-figures-mcp",
"afm-server"
],
"env": {
"AFM_IMAGE_PROVIDER": "google",
"GOOGLE_API_KEY": "your_google_api_key"
}
}
}
}
If your MCP host prefers a checked-out repository instead of uvx, keep the repo path absolute and use the existing uv run --project /absolute/path/to/academic-figures-mcp python -m src.presentation.server form.
For local development with the newer MCP SDK transport options, the server defaults to stdio, and can also be started with MCP_TRANSPORT=streamable-http for HTTP-based inspection workflows.
Cross-Platform Notes
- Package mode is the most portable install path:
uvx --from academic-figures-mcp afm-serverworks without shell-specific quoting on macOS, Linux, and Windows. - Local checkout mode is also cross-platform: use
scripts/start_afm_local.pyon macOS/Linux andscripts/start_afm_local.ps1on Windows PowerShell. - Environment parsing already accepts
KEY=value,export KEY=value, andset KEY=value, so the same env profile can be reused across Bash, Zsh, Fish-style exports, and PowerShell/CMD-oriented files. - The VS Code extension falls back to package mode through
uvxwhen no local source tree is detected, which is the safest route for non-developer users on all three platforms.
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโ
โ Your AI Agent โ VS Code Copilot, Claude Code,
โ (Copilot, Claude, โ OpenClaw, Hermes, etc.
โ any MCP host) โ
โโโโโโโโโโโโฌโโโโโโโโโโโโ
โ MCP stdio / streamable-http
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Academic Figures MCP โ
โ โโโโโโโโโโโโโโโโโโโโโโ โ
โ โ plan_figure โ โ
โ โ generate_figure โ โ
โ โ edit_figure โ โ 5 Tools
โ โ evaluate_figure โ โ
โ โ batch_generate โ โ
โ โโโโโโโโโโฌโโโโโโโโโโโโ โ
โ โ โ
โ โโโโโโโโโโผโโโโโโโโโโโโโโ โ
โ โ Core Orchestrator โ โ
โ โ โ โ
โ โ 1. fetch_paper() โ โ โ PubMed E-utilities
โ โ 2. classify_type() โ โ โ Keyword + structured planning heuristics
โ โ 3. build_payload() โ โ โ reusable render request / prompt pack
โ โ 4. generate_image() โ โ โ single public renderer (Google / OpenRouter / Ollama SVG)
โ โ 5. evaluate() โ โ โ 8-domain vision scoring or local critique
โ โ 6. iterate() โ โ โ harness-guided revision loop
โ โโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Figure Types & Auto-Classification
The MCP auto-classifies papers into optimal figure types:
| Type | Best For | Example Papers |
|---|---|---|
| Flowchart | Consensus, guidelines | "SSC 2026 Sepsis Guidelines" |
| Mechanism | Drug mechanisms, pathways | "Sugammadex encapsulation mechanism" |
| Comparison | RCTs, meta-analyses | "Crystalloid vs Colloid fluid resuscitation" |
| Infographic | Reviews, overviews | "Perioperative fasting consensus" |
| Timeline | Historical, longitudinal | "Evolution of general anesthesia" |
| Anatomical | Surgical techniques, blocks | "Regional anesthesia approaches" |
| Data Visual | PK/PD, dose-response | "Propofol PK modeling" |
Knowledge Base (Included)
This repo ships with 9 curated reference assets:
| File | Content |
|---|---|
prompt-templates.md |
7-block prompt templates for 9 figure types |
anatomy-color-standards.md |
Medical illustration color coding reference |
journal-figure-standards.md |
Nature/Lancet formatting requirements |
journal-profiles.yaml |
Machine-readable journal registry for automatic prompt injection |
gemini-tips.md |
Gemini 3.1 Flash prompt engineering best practices |
model-benchmark.md |
NB2 vs GPT Image 1.5 comparison data |
code-rendering.md |
matplotlib/Python figure generation reference |
scientific-figures-guide.md |
Scientific figure design principles |
ai-medical-illustration-evaluation.md |
8-domain evaluation rubric |
Planned Rendering Ecosystem
This project is no longer framed as a single-route Gemini prompt server. The current design direction is a multi-route figure system:
Matplotlib+SciencePlotsfor deterministic, publication-style chartsD2+Mermaidfor structured diagrams and editable text-first figure specsFigureFirst+CairoSVGfor precise multi-panel assembly and exportExcalidrawortldrawas future interactive vector-editing layers inside the VS Code extensionKrokias an optional self-hosted render gateway for compatibility with multiple DSL engines
Development
uv sync
uv run python -m src.presentation.server
The planned Gemini image integration follows the current Google Gen AI SDK pattern:
from google import genai
from google.genai import types
License
Apache License 2.0. See LICENSE.
Composite Engine (Multi-Panel Layout)
The composite module solves Gemini's weakness with multi-panel figures.
Instead of generating a single image with all panels (which often fails on
spatial layout, numbering, and mixed styles), it:
- Generates each panel independently with focused prompts
- Composites them using Pillow with precise pixel-level layout
- Programmatic text overlay โ 100% accurate labels, no misspellings
Composite Usage
from src.infrastructure.composite import CompositeFigure, PanelSpec
from src.server import generate_figure
# Step 1: Generate panels separately
left = generate_figure(pmid="41657234", figure_type="anatomy")
right = generate_figure(pmid="41657234", figure_type="ultrasound")
# Step 2: Composite
comp = CompositeFigure()
comp.add_panel(
PanelSpec(prompt="...", label="A", panel_type="anatomy"),
left["image_path"]
)
comp.add_panel(
PanelSpec(prompt="...", label="B", panel_type="ultrasound"),
right["image_path"]
)
comp.set_title("Interscalene Brachial Plexus Block")
comp.set_citation("PMID 41657234 ยท Regional Anesthesia")
comp.compose("interscalene_block.pdf")
MCP Tool: composite_figure
composite_figure(
panels=[["left.png", "anatomy"], ["right.png", "ultrasound"]],
labels=["A", "B"],
title="..."
)
Layout Specs
| Property | Value |
|---|---|
| Canvas | 2400 ร 1600 px (8" ร 5.33" @ 300 DPI) |
| Format | Double column (~183mm width, Nature standard) |
| Labels | A/B/C with pill-shaped background |
| Footer | Caption + PMIDs + citation |
| Divider | Vertical line between panels |
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file academic_figures_mcp-0.4.4.tar.gz.
File metadata
- Download URL: academic_figures_mcp-0.4.4.tar.gz
- Upload date:
- Size: 71.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08d1cdca2d9653eeefd72e98aeb6e57ea78244efc647cc3d1237f3f3e932b7ec
|
|
| MD5 |
22c3cef3edb5be267161cbcace44bd69
|
|
| BLAKE2b-256 |
2bb38926e40d66ec5ab81241a13fe185b81294bb6cb422cad651e97691d3a114
|
Provenance
The following attestation bundles were made for academic_figures_mcp-0.4.4.tar.gz:
Publisher:
publish.yml on u9401066/academic-figures-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
academic_figures_mcp-0.4.4.tar.gz -
Subject digest:
08d1cdca2d9653eeefd72e98aeb6e57ea78244efc647cc3d1237f3f3e932b7ec - Sigstore transparency entry: 1303934861
- Sigstore integration time:
-
Permalink:
u9401066/academic-figures-mcp@fec9e072920a90fb46d0c534d9a0d82381dc5c47 -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/u9401066
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fec9e072920a90fb46d0c534d9a0d82381dc5c47 -
Trigger Event:
push
-
Statement type:
File details
Details for the file academic_figures_mcp-0.4.4-py3-none-any.whl.
File metadata
- Download URL: academic_figures_mcp-0.4.4-py3-none-any.whl
- Upload date:
- Size: 88.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6b7979e3a3eb63c42e8e2d54d686b23640025a008740daaa8e8aa1dd0b6dfae8
|
|
| MD5 |
b917677a1a89088392ba1eb0b9f8d419
|
|
| BLAKE2b-256 |
556c8b2bc12adc92640a2be4358277b3c53b87a0a489d3f40d887835b6614255
|
Provenance
The following attestation bundles were made for academic_figures_mcp-0.4.4-py3-none-any.whl:
Publisher:
publish.yml on u9401066/academic-figures-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
academic_figures_mcp-0.4.4-py3-none-any.whl -
Subject digest:
6b7979e3a3eb63c42e8e2d54d686b23640025a008740daaa8e8aa1dd0b6dfae8 - Sigstore transparency entry: 1303934984
- Sigstore integration time:
-
Permalink:
u9401066/academic-figures-mcp@fec9e072920a90fb46d0c534d9a0d82381dc5c47 -
Branch / Tag:
refs/tags/v0.4.4 - Owner: https://github.com/u9401066
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@fec9e072920a90fb46d0c534d9a0d82381dc5c47 -
Trigger Event:
push
-
Statement type: