Skip to main content

A multi-agent LLM orchestrator for academic peer-review.

Project description

๐Ÿง‘โ€๐Ÿ”ฌ Agentic_Paper

A multi-agent LLM orchestrator for academic peer-review.

PyPI version Python 3.10+ License: MIT CI Code style: ruff

Built for students, PhDs, and researchers who want a transparent, reproducible second opinion on a manuscript โ€” not another opaque chatbot.


Why Agentic_Paper?

This is not another ChatGPT wrapper.

A single LLM, given a paper and the prompt "please review this", gives you the average of the internet. Agentic_Paper does something genuinely different:

  • ๐Ÿง  12 specialised reviewer agents run in parallel, each with its own role, prompt, and base complexity โ€” Methodology, Results, Literature, Structure, Impact, Contradiction, Ethics, AI-Origin, Hallucination, Citation Validator, Statcheck Validator, Revision Assessor.
  • ๐Ÿง‘โ€โš–๏ธ A Coordinator synthesises their structured verdicts, names disagreements, and orders revision priorities.
  • โœ‰๏ธ An Editor + an Author/Editor Summary agent produce a journal-style decision letter and the confidential note to the editor โ€” separately.
  • ๐Ÿ“œ Every single LLM call is audited โ€” token counts, latency, cost estimate, prompt hash, thinking-mode flag, seed โ€” written to audit.jsonl so you can prove what was asked and answered. No hallucination hides in the dark.
  • ๐Ÿ”Ž Citations are validated against OpenAlex (~250M open scholarly records, no API key needed). Fabricated references get flagged automatically.
  • ๐Ÿงฎ Reported p-values are recomputed via the R statcheck package โ€” if a paper says t(28) = 2.3, p = .01 and the math says p โ‰ˆ 0.029, you'll see it.
  • ๐Ÿ”Œ Multi-provider, pluggable: OpenAI, Anthropic Claude, Google Gemini, and any OpenAI-compatible local endpoint โ€” see ยง Local & Free Models.
  • ๐ŸŽ›๏ธ Typed everything: reviewers don't return free-form prose, they return validated pydantic models. Downstream agents consume structure, not substrings.

Outputs: a Markdown report, a stand-alone HTML dashboard, a structured JSON, and a run_id-scoped folder you can hand off when a journal asks "how was this assessment produced?".


Installation

pip install agentic-paper

That's it. Pure-Python; works on macOS, Linux, and Windows with Python 3.10+.

For the optional web UI (FastAPI + HTMX live demo):

pip install "agentic-paper[web]"

For statistical sanity checking (recommended for empirical papers), also install R and the statcheck + jsonlite packages:

install.packages(c("statcheck", "jsonlite"))

If R isn't available, the rest of the pipeline still runs โ€” the Statcheck Validator simply reports "not available" in the final report.


Quickstart

1. Set a provider key

export OPENAI_API_KEY="sk-..."
# Optional, for multi-provider routing:
export ANTHROPIC_API_KEY="sk-ant-..."
export GOOGLE_API_KEY="..."

๐Ÿ’ก No budget? Skip this step and jump to Local & Free Models.

2. Review a paper from the terminal

agentic-paper paper.pdf --seed 42

Outputs land under output_paper_review/<run_id>/ โ€” open dashboard_*.html for a styled report, or read review_report_*.md directly.

3. Or use the web UI

agentic-paper-web --port 8000
# โ†’ http://127.0.0.1:8000/

A clean drop-zone page: drag a PDF in, watch the 12 agents think live (real thinking_delta stream when the provider supports it), then read the report inline. Optional Bring-Your-Own-Key form for sharing the demo with colleagues without exposing your account โ€” keys are held in the worker stack frame, never logged, never written to disk.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  drop a PDF here  โ†’  watch the agents work  โ”‚
โ”‚  โ ‹ methodology   readingโ€ฆ                   โ”‚
โ”‚  โœ“ results       done (4.2 s, $0.018)       โ”‚
โ”‚  โ ด literature    thinkingโ€ฆ                  โ”‚
โ”‚  โ€ฆ                                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

โšก Auto-Mode: never fail because of a missing key

The web UI's routing profiles (max / std / quick) deliberately spread agents across multiple vendors to play to each model's strengths โ€” for example std sends High-tier reasoning to Claude, Standard tier to GPT, Basic tier to Gemini. If you only paste one API key in the BYOK form, naรฏve routing would 404 on the other two providers and tank the run.

Auto-Mode fixes this transparently. When the BYOK form is submitted:

  1. Each tier is checked against the keys you actually provided.
  2. Tiers pointing to an unavailable provider are remapped to an equivalent model on a provider you do have (e.g. tier_high: anthropic/claude-opus-4-7 โ†’ google/gemini-3-pro).
  3. thinking_budget and the tier's role intensity are preserved โ€” Auto-Mode picks the flagship reasoning model of the fallback vendor for tier_high, the mid-tier for tier_standard, and the cheapest for tier_basic.
  4. A yellow banner at the top of the run page lists every remap with the original vs. new (provider, model) so you know exactly what changed.

The run proceeds end-to-end with a single key, with no manual config edits. Auto-Mode only kicks in when at least one BYOK key is supplied โ€” runs that use the server-side config are left alone.


๐Ÿฆ™ Local & Free Models (with Ollama)

You don't need a credit card to use Agentic_Paper. The ProviderRegistry accepts any OpenAI-compatible endpoint, which means you can run the entire pipeline against Ollama, LM Studio, vLLM, or any local server you control. Free peer-review, fully private, all on your laptop.

Step-by-step: Ollama + Llama 3

# 1. Install Ollama from https://ollama.com (one-line installer)
# 2. Pull a model โ€” Llama 3.1 8B fits on a laptop with 16 GB RAM
ollama pull llama3.1

# 3. Start Ollama in the background (it auto-serves an OpenAI-compatible API on :11434)
ollama serve &

# 4. Point Agentic_Paper at it โ€” two env vars is all it takes
export OPENAI_API_KEY="ollama"                          # any non-empty string
export OPENAI_API_BASE="http://localhost:11434/v1"      # Ollama's OpenAI-compat endpoint

# 5. Run the review using your local model
agentic-paper paper.pdf --config config.local.yaml

Minimal config.local.yaml to wire every tier to the local model:

output_dir: output_paper_review
routing:
  tier_high:     { provider: openai, model: llama3.1 }
  tier_standard: { provider: openai, model: llama3.1 }
  tier_basic:    { provider: openai, model: llama3.1 }
providers:
  openai:
    api_key_env: OPENAI_API_KEY
    base_url: http://localhost:11434/v1

Recommended local model tiers

Hardware Suggested model Notes
Laptop, 16 GB RAM llama3.1 (8B) Solid baseline. Reviews are slower but coherent.
Workstation, 32 GB+ llama3.1:70b or qwen2.5:32b Closer to GPT-4o quality on reasoning.
GPU box, 24 GB+ VRAM deepseek-r1 via vLLM Excellent for the Methodology / Contradiction reviewers.
Mac Studio (M2 Ultra+) llama3.1:70b MLX Apple-silicon native; faster than CUDA at comparable mem.

Caveats with local models

  • Structured outputs: small open-weight models occasionally violate the JSON schema. Agentic_Paper retries with tenacity and falls back to response_format: json_object. Larger models (โ‰ฅ 30B) are noticeably more reliable.
  • Quality: a 7-8B local model will not match Claude Opus 4.7 โ€” but for a first pass on a draft (catching contradictions, missing citations, structural issues), it's more than enough.
  • Privacy: nothing leaves your machine. Perfect for unpublished manuscripts under embargo or NDA.
  • Cost: literally zero (modulo electricity).

Mixed routing: free local + paid top-tier

You can also keep the cheap agents local and route only the heavy reasoning to a paid provider:

routing:
  tier_high:     { provider: anthropic, model: claude-opus-4-7, thinking_budget: auto }
  tier_standard: { provider: openai,    model: gpt-5.4-mini }
  tier_basic:    { provider: ollama_local, model: llama3.1 }
providers:
  ollama_local:
    api_key_env: OPENAI_API_KEY
    base_url: http://localhost:11434/v1

The framework treats any custom provider name with a base_url as OpenAI-compatible.


Architecture (in 30 seconds)

        PDF โ”€โ”€โ–ถ PaperExtractor โ”€โ”€โ–ถ paper.txt + complexity score
                                          โ”‚
                                          โ–ผ
                              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                              โ”‚ ConcurrentAgentRunner  โ”‚
                              โ”‚   (asyncio.gather)     โ”‚
                              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                         โ”‚ 12 reviewers in parallel
                                         โ–ผ
                              Coordinator โ”€โ–ถ Author/Editor Summary
                                         โ”‚
                                         โ–ผ
                                      Editor
                                         โ”‚
                                         โ–ผ
                          Markdown ยท JSON ยท HTML ยท audit.jsonl
                          (all under output/<run_id>/)

The codebase is deliberately small and modular:

  • orchestrator.py โ€” coordinates the pipeline; doesn't know about concurrency.
  • agent_runner.py โ€” ConcurrentAgentRunner owns the asyncio machinery. Swappable for Celery / Ray / Dask without touching the orchestrator.
  • storage.py โ€” StorageProvider ABC + LocalFileStorage. Implement S3Storage or PostgresStorage once; everything else keeps working.
  • providers/ โ€” one module per vendor (OpenAI, Anthropic, Google, OpenAI-compat). Each implements a uniform LLMProvider interface.
  • agents/ โ€” one file per role. Each defines KEY, NAME, INSTRUCTIONS, SCHEMA, base_complexity. Adding a 13th reviewer is a 30-line file.
  • schemas.py โ€” pydantic models. Every LLM call returns a validated instance, not a parsed string.
  • external/ โ€” OpenAlex (citations), statcheck (R subprocess).

If you read one file to understand the project, read agentic_paper/orchestrator.py. It's ~570 lines and reads like the table of contents of this README.


What's in the run directory

After agentic-paper paper.pdf finishes, output_paper_review/<run_id>/ contains:

audit.jsonl              โ† one JSON row per LLM call (12 fields)
paper.txt                โ† extracted text (kept for retry-failed-agents)
paper_info.json          โ† title / authors / abstract / detected sections
review_<agent>.txt       โ† every reviewer's validated, structured verdict
review_report_*.md       โ† the human-readable report
review_results_*.json    โ† machine-readable bundle (incl. routing + audit summary)
executive_summary_*.md   โ† one-page TL;DR
dashboard_*.html         โ† stand-alone styled report (no server needed)
prompts/<agent>.txt      โ† exact prompt sent โ€” full prompt + context dump
responses/<agent>.json   โ† raw response payload from the provider
paper_review_system.log  โ† debug log of the whole run

This is the reproducibility bundle. Hand it off when a journal asks "how was this assessment produced?" and the answer is one tarball.


Reproducibility & determinism

agentic-paper paper.pdf --seed 42

The seed is forwarded to every provider that supports it:

  • OpenAI โ€” seed=N on Responses + Chat Completions.
  • Google Gemini โ€” GenerateContentConfig.seed=N.
  • Anthropic โ€” recorded in audit but not propagated (the Messages API doesn't expose a seed yet); pair with temperature: 0 for maximal stability.

Cost, latency, and token counts for every call are queryable from audit.jsonl with one jq command โ€” no separate observability stack required.


Limitations (honest)

Things Agentic_Paper does not do:

  • Substitute for human peer review. It surfaces mechanical issues โ€” internal inconsistencies, citation gaps, statistical misreporting โ€” faster than a tired human reviewer. It does not have taste, domain depth in your niche, or knowledge of journal-specific norms.
  • Inspect figures, tables, or equations rendered as images. Only text is parsed (pdfplumber + heuristics).
  • Fact-check beyond citations. No PubMed / arXiv / Semantic Scholar grounding โ€” only OpenAlex resolution of explicit references.
  • Multi-paper synthesis. One paper per run; use a shell loop for batch.
  • Translate. Non-English papers technically work but the reviewer prompts assume an English peer-review register.

Development

git clone https://github.com/albertogerli/Agentic_Paper.git
cd Agentic_Paper
pip install -e ".[dev,web]"
pytest -q --cov=agentic_paper --cov-fail-under=60

224 tests, ~74 % line coverage, CI on Python 3.10 / 3.11 / 3.12.

PRs welcome โ€” especially: new local-model recipes, new reviewer roles, S3/Postgres StorageProvider implementations, non-English prompt packs.


Citing

If Agentic_Paper contributes to research output, please cite:

@software{gerli_agentic_paper_2026,
  author    = {Gerli, Alberto G.},
  title     = {Agentic\_Paper: A multi-agent, multi-provider, structured-output
               peer-review pipeline for scientific manuscripts},
  year      = {2026},
  url       = {https://github.com/albertogerli/Agentic_Paper},
  version   = {2.0.0}
}

License

MIT. Use it, fork it, ship it.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_paper-2.1.0.tar.gz (102.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentic_paper-2.1.0-py3-none-any.whl (124.6 kB view details)

Uploaded Python 3

File details

Details for the file agentic_paper-2.1.0.tar.gz.

File metadata

  • Download URL: agentic_paper-2.1.0.tar.gz
  • Upload date:
  • Size: 102.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_paper-2.1.0.tar.gz
Algorithm Hash digest
SHA256 73130b1eba01905ddb254602adc5eb15f2bdc1a5630645688242fecd6ade76a5
MD5 9c66cf33b9691ed8bebdbaadd437c640
BLAKE2b-256 9e17d4b1151f23b6ce684b8e1cb3d3a17e1c741e1d1812935a3266c87f21e979

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_paper-2.1.0.tar.gz:

Publisher: release.yml on albertogerli/Agentic_Paper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentic_paper-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentic_paper-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 124.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentic_paper-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 106283ffac050b16b4f4fac30cd50f0103e1d82ef5907d1422844a994d9eb0c5
MD5 9de663601bb63b09fb8d294d0ea0b8c2
BLAKE2b-256 261df3f7b5e1704cba2280457cd69d30f18226e905b5a8955e9ce807c4b8c70f

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_paper-2.1.0-py3-none-any.whl:

Publisher: release.yml on albertogerli/Agentic_Paper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page