AI paper review system: multiple LLM reviewer personas, parallel review, clustering and ranking, human-feedback calibration.

These details have not been verified by PyPI

Project links

Project description

AI Paper Review

Get multiple expert perspectives on your research paper in a few minutes. Upload a PDF, pick how many reviewers from a pool of AI personas should examine it (default 10, recommended 5–10 for a good balance of speed and accuracy; hard range 1–20), each selected reviewer produces 5–10 structured review comments in parallel, and the results are clustered and ranked so the issues multiple reviewers raise float to the top.

Two reviewer databases are bundled by default: Computer Architecture and Machine Learning & AI (200 reviewers each: 10 sub-domains × 20 field-specific personas). The reviewer database is a swappable input: you can build one for any research field and upload it through the web UI — see Bring your own reviewer database below and Database Format for the format spec.

⚠️ Intended use — please read

Intended use. This tool is a draft-polishing aid for papers you are writing. It is not a peer-review generator. Most venues have strict policies against using LLMs in assigned reviews, due to concerns about bias, hallucination, and the potential for compromising the integrity of the peer-review process. Please use it at your own discretion, and indicate when you have used it.

Scope. The system takes in the PDF directly. Depending on the LLM provider, it either analyzes the full PDF directly or focuses on the text and tables only (extracted by pypdf and MarkItDown). Expect the reviews to focus on methodology description, claims, experimental design, evaluation setup, and writing quality.

Quality. Every comment this system produces is a suggestion to evaluate, not a finding to accept. AI reviewers hallucinate, miss context, and over-confidently flag non-issues. Expect to reject roughly half of what you see. Alignment verdicts in the validation flow are heuristic: LLMs can match surface wording while missing intent, and miss real matches that are phrased differently. Treat all output as a signal, not ground truth, and use it at your own discretion.

Citation

If you use this tool in your paper drafting, please cite:

@article{ai-paper-review,
  author = {Di Wu},
  title = {Can AI Review Improve Paper Drafting? An Empirical Study on 20 Computer Architecture Submissions},
  journal = {arXiv preprint},
  year = {2026}
}

Quick start

# 1. Install
git clone <this-repo> ai_paper_review
cd ai_paper_review
conda env create -f environment.yml            # installs Python deps + LLM SDKs + gh CLI + ai-paper-review in developer mode
conda activate ai-paper-review

# 2. Configure your LLM provider
cp config.example.yaml config.yaml            # always required — then edit provider + credentials

# Option A: Claude Agent SDK (Claude Code / Claude Pro/Max/Team — no API key needed)
claude /login                                  # one-time login via the Claude Code CLI
# set `provider: claude_sdk` in config.yaml

# Option B: GitHub Copilot (no API key needed)
gh auth login                                  # one-time GitHub auth
# set `provider: copilot_sdk` in config.yaml

# Option C: API-key providers (Anthropic / OpenAI / Google / xAI / GitHub Models)
# set provider + paste your API key in config.yaml

# 3. Launch the web UI to make the life easy
ai-paper-review-web

Open http://127.0.0.1:8000. The home page shows a provider picker (green = ready to use, red = missing credentials or SDK) and an upload box. Drop a PDF, wait 1–5 minutes, and you'll get a ranked list of issues with links to drill into each cluster.

Prefer the command line? Jump to Using the CLI.

Install

The supported install is conda — environment.yml asks for Python 3.11 or newer plus the gh GitHub CLI for Copilot SDK auth. ai-paper-review is installed directly in developer mode. A developer install is included during creating the conda env.

conda env create -f environment.yml         # one time — installs Python, LLM SDKs, and gh
conda activate ai-paper-review

You can also instal from PyPI, which does not render valid Docs page on the web UI.

pip install ai-paper-review                 # PyPI install

After install, five console scripts are on your $PATH:

Command	Purpose
`ai-paper-review-web`	Launch the Flask web UI (button-driven flow)
`ai-paper-review-review`	Review a PDF from the CLI
`ai-paper-review-validate`	Compare AI review vs human review and emit per-paper calibration delta
`ai-paper-review-aggregate`	Roll up N calibration deltas into cross-paper tuning recommendations
`ai-paper-review-generate-db`	Generate a reviewer-database markdown from a YAML config

Configure your LLM

Copy the template and edit two things:

cp config.example.yaml config.yaml

llm_review:
  provider: anthropic_api        # or: openai_api | google_api | xai_api | github_api |
                                 #     claude_sdk | copilot_sdk | openai_compatible_api
  model: claude-sonnet-4-6

# llm_validation:                # optional — inherits llm_review when absent
#   provider: openai_api
#   model: gpt-4o-mini

api_keys:
  anthropic_api: sk-ant-...      # fill in the one that matches your provider

Supported providers

Each provider has a different setup flow — API key, PAT, SDK install, or local base_url. Canonical provider names use a suffix so the kind is visible at a glance: *_api for HTTP-based providers that take an API key or PAT, *_sdk for locally-installed SDKs that inherit a CLI's login. The config column is what you paste into provider:; the setup column is what you do once to unlock it. The PDF input column shows whether the paper PDF reaches the model as-is or is converted to text first.

Provider	Config value	PDF input	Setup flow
Anthropic Claude	`anthropic_api`	Direct	Create an API key at https://console.anthropic.com/ → set `api_keys.anthropic_api` in `config.yaml` or export `ANTHROPIC_API_KEY`.
OpenAI GPT	`openai_api`	Direct (OpenAI endpoint only)	Create an API key at https://platform.openai.com/api-keys → `api_keys.openai_api` or `OPENAI_API_KEY`. Azure OpenAI: also set `base_url: https://<resource>.openai.azure.com/openai/deployments/<deployment>` under `llm_review`.
Google Gemini	`google_api`	Direct	Create an API key at https://aistudio.google.com/apikey → `api_keys.google_api` or `GEMINI_API_KEY` (falls back to `GOOGLE_API_KEY`).
xAI Grok	`xai_api`	Direct (grok-4-class models)	Create an API key at https://console.x.ai/ → `api_keys.xai_api` or `XAI_API_KEY`. Base URL is hardcoded to `https://api.x.ai/v1`.
GitHub Models	`github_api`	Text	Create a fine-grained GitHub Personal Access Token at https://github.com/settings/tokens (no repo scope needed) → `api_keys.github_api` or `GITHUB_TOKEN` (falls back to `GITHUB_PAT`). Browse the catalog at https://github.com/marketplace/models.
Claude Agent SDK	`claude_sdk`	Direct	`pip install claude-agent-sdk` (already in `environment.yml`), then `claude /login` once via the Claude Code CLI. No API key needed — the SDK inherits the CLI's login (shared with VSCode/JetBrains Claude extensions). Routes through your Claude Pro/Max/Team subscription.
GitHub Copilot SDK	`copilot_sdk`	Text	`pip install github-copilot-sdk` (already in `environment.yml`), then `gh auth login` once. No API key needed — the SDK inherits the Copilot CLI's local auth. Works alongside VSCode Copilot.
OpenAI-compatible	`openai_compatible_api`	Text	Point at any OpenAI-protocol endpoint via `base_url` under `llm_review` (e.g. Ollama `http://localhost:11434/v1`, vLLM / llama.cpp, Together, Groq, DeepSeek, Fireworks, Azure-style proxies). API key is optional when the base_url looks local; otherwise use `api_keys.openai_compatible_api` or `OPENAI_API_KEY`.

Full setup details, env-var precedence, rate-limiting presets, and per-stage provider split: LLM providers.

Using the web UI

Launch with ai-paper-review-web and open http://127.0.0.1:8000. The server writes uploads and run outputs to ./ai-paper-review-data/ in the directory you launched it from (override with PAPER_REVIEW_WORKDIR=/path/to/data). The top nav exposes the seven pages below.

Model — set your LLM provider

Open Model first. The page shows all eight providers as cards (green = ready; red = missing credentials or SDK). Below the grid, the Review model and Validation model sections let you pick the active provider, model, and optional base URL per stage — applied immediately for this session (env-var overrides) and cleared on server restart. For permanent defaults, edit config.yaml directly.

Review — review a paper

Pick a reviewer database (bundled default, or a .md you uploaded on the Database page).
Pick the number of reviewers to run (default 10; the input is auto-bounded to the smaller of the per-run hard cap and the selected database's size, with an inline error if you exceed it).
Upload the PDF.
The status page polls until the review finishes (1–5 min), then redirects to the result page, which shows:
- Selected reviewers + their topic-relevance scores.
- A Writing clarity review section — always-on G001 reviewer, writing-quality only, never clustered or compared to human reviews.
- Ranked issues (major / moderate / minor) grouped by cross-reviewer clustering, each expandable to show every reviewer who raised it.
- Downloads: review_report.md, review_data.md, writing_clarity_review.md, and the two similarity-matrix artifacts (selection_similarities.md, clustering_similarities.md).

Validation — compare AI vs human reviews

Upload the human review. Raw text (HotCRP / OpenReview / generic) or markdown both work — an LLM reshapes it into the AI-review schema automatically. Files already in that schema are passed through untouched.
Pick the AI side: either a prior review from the dropdown (auto-populated from past runs on this server) or upload a review_data.md.
Click Run validation. The status page polls until the single batch-similarity LLM call and alignment finish (~30–90 s), then redirects to the result page.
The result page shows summary metrics (recall / precision / F1 / severity-weighted recall), per-persona performance, hits / misses / false alarms, and per-paper calibration suggestions.

Aggregation — cross-paper tuning recommendations

After several validations accumulate in the workdir, open Aggregation. It globs every completed validation run's calibration_delta.json, groups the suggestions by (type, target), and renders the ones that repeat across ≥ min_support papers (default 2) as actionable tuning recommendations for the reviewer database. A small form lets you tune min_support live. Reporter only — nothing is written to disk from this page.

Database — browse / upload reviewer databases

Filter by domain or persona, search by keyword, and click into any reviewer to see the full system prompt. The same page has the upload form for dropping in a custom .md for a different research field; the Build a new database walkthrough spells out the YAML template + LLM-expansion recipe, including the list of 20 canonical persona names Validation's calibration attribution looks for.

Using the CLI

Three console scripts, all flat (no subcommand layer). They read provider/model defaults from config.yaml unless overridden. Only ai-paper-review-review exposes --provider / --model flags; ai-paper-review-validate picks up PAPER_REVIEW_VALIDATION_*_OVERRIDE env vars (set by the web UI's Model page or by hand); ai-paper-review-aggregate makes no LLM calls at all.

Review a paper — `ai-paper-review-review`

ai-paper-review-review --pdf paper_draft.pdf

Writes five files next to the PDF:

File	Content
`paper_draft_review.md`	Ranked review report (human-readable).
`paper_draft_review_data.md`	Per-reviewer structured comments — the canonical input to Validation.
`paper_draft_writing_clarity_review.md`	Always-on `G001` writing-clarity reviewer's output. Never enters Validation.
`paper_draft_selection_similarities.md`	Full reviewer-vs-paper similarity landscape; top-N are marked.
`paper_draft_clustering_similarities.md`	Pairwise comment similarity + clustering decisions (near-threshold pair list + full matrix).

Flags (full list via --help):

ai-paper-review-review \
    --pdf paper_draft.pdf \
    --db comparch_reviewer_db.md \        # defaults to the bundled computer_architecture DB
    --reviewers 7 \                    # N (default 10; hard range 1–20)
    --provider openai_api --model gpt-4o \ # per-run overrides, else config.yaml
    --out review_report.md \                    # default: <pdf_stem>_review.md
    --data-out review_data.md \                 # default: <pdf_stem>_review_data.md
    --clarity-out clarity.md \                  # default: <pdf_stem>_writing_clarity_review.md
    --similarities-out selection_sims.md \      # default: <pdf_stem>_selection_similarities.md
    --clustering-similarities-out clustering_sims.md  # default: <pdf_stem>_clustering_similarities.md

Validate AI vs human review — `ai-paper-review-validate`

The CLI validator expects the human review to already be in AI-review-format markdown. The easiest way is the web UI's Validation page — it accepts raw text and runs conversion → alignment → calibration in one click.

ai-paper-review-validate \
    --actual my_paper_actual.md \
    --ai-review paper_draft_review_data.md \
    --out my_validation.md \            # default: <actual>_validation.md
    --calibration-out my_calibration.json   # default: <actual>_calibration.json

Writes five files into the same directory as --out:

File	Content
`<actual>_validation.md`	Validation report — miss analysis, metrics, calibration suggestions (human-readable).
`<actual>_calibration.json`	Per-paper calibration delta JSON — input to `ai-paper-review-aggregate`.
`alignment_llm_analysis.md`	Verbatim LLM prompt + response for the alignment step — full audit trail.
`alignment_similarities.md`	N × M human-vs-AI comment similarity matrix; best match per human comment bolded.
`alignment_ranking.md`	Human comments ranked by best-match similarity score, highest first.

Full schema: Validation Output Format. No --provider / --model flags — set the validation-stage LLM in config.yaml or via PAPER_REVIEW_VALIDATION_PROVIDER_OVERRIDE / PAPER_REVIEW_VALIDATION_MODEL_OVERRIDE.

Cross-paper aggregation — `ai-paper-review-aggregate`

After several validation runs accumulate, roll up their calibration deltas into reviewer-database tuning recommendations:

ai-paper-review-aggregate \
    'ai-paper-review-data/runs/validation_*/calibration_delta.json' \
    --min-support 2 \
    --out recommendations.md    # default: stdout if --out omitted

Reporter only — it doesn't modify any config or database file; it prints suggestions that repeat across ≥ min_support papers. See Aggregation for the full design notes.

How it works

Three stages, each a separate surface. The review pipeline produces structured critique of one paper; the validation pipeline compares that critique to a real human review and records a calibration delta; aggregation — a post-pipeline reporter — rolls up many deltas into tuning recommendations for the reviewer database.

  INPUTS                      STAGE                             OUTPUTS
  ────────────────────        ──────────────────────            ─────────────────────────────
  paper.pdf               ──▶ [1] Review pipeline          ──▶  review_report.md
  comparch_reviewer_db.md        (ingest → select N        ──▶  review_data.md
  N (1–20, default 10)         reviewers → clarity         ──▶  writing_clarity_review.md
  provider / model             reviewer → dispatch in      ──▶  selection_similarities.md
                               parallel → cluster → rank)  ──▶  clustering_similarities.md
                                       │
                                       ▼
  human_review.txt/md     ──▶ [2] Validation pipeline      ──▶  validation_report.md
  review_data.md              (convert → align → metrics   ──▶  calibration_delta.json
   (from stage 1)              → calibration → report)
                                       │
                                       ▼
  N × calibration_delta.json  ──▶ [3] Aggregation (reporter) ──▶ cross-paper recommendations
  (from many runs of stage 2)     (group by type/target,         (markdown; hand-applied to
                                   filter by min_support)         the reviewer-database YAML)

Each box maps to a dedicated doc with the stage-by-stage breakdown, diagram, and I/O schema:

For format specs, provider handling, and reviewer-database details:

LLM Providers — LLM provider support and configuration
Database Format — reviewer-database YAML and markdown formats
Review Output Format — per-review markdown format
Validation Output Format — validation run artifacts, alignment semantics, calibration_delta.json schema

Customization

The project is designed so the four most-likely-to-tune surfaces — rate limits, the reviewer database, LLM providers, and prompts — can each be changed without touching Python, or with a minimal drop-in.

Tuning knobs

Runtime behavior is tuned through a small set of knobs. The first group lives in config.yaml under llm_review:; the second group is set per-run via env vars or CLI flags.

Knob	Where	Default	What it does
`max_concurrent`	`config.yaml`	`10`	Max parallel LLM requests during reviewer dispatch. Lower on strict free tiers.
`request_delay`	`config.yaml`	`0.0`	Seconds between dispatching consecutive requests. Set to ~1 s on free tiers hitting RPM limits.
`max_retries`	`config.yaml`	`2`	Retries on HTTP 429 / 5xx before a reviewer is logged as failed.
`retry_base_delay`	`config.yaml`	`5.0`	Base seconds for exponential backoff on retries (attempt 1 waits base, attempt 2 waits `2×`, etc.).
`CLUSTER_THRESHOLD`	env var	`0.55`	Cosine-similarity threshold for merging two review comments into one cluster. `0.65` = stricter.
`domain_bleed`	`select_reviewers()` arg	`0.15`	How far outside the top domain the selector may reach to pick a persona-diverse Nth reviewer.
`n_reviewers`	per-run form / CLI flag	`10`	Top-N reviewers to dispatch; recommended 5–10, hard range 1–20. Auto-capped at the database's size.

Suggested presets for paid-plan / free-tier / local-model configs live in the LLM providers doc.

Bring your own reviewer database

Two databases are bundled — Computer Architecture and Machine Learning & AI — each as a YAML config and a generated 200-reviewer markdown. For any other field:

Generate a config YAML — use the prompt at src/ai_paper_review/prompts/database_generation.md: replace [FIELD NAME], paste into any capable LLM, and get a complete YAML in one shot. Or copy one of the bundled *_reviewer_cfg.yaml files and edit it manually.
Generate the database — run ai-paper-review-generate-db --config my_field_cfg.yaml --out my_field_db.md.
Upload it — drop the .md on the Database page; the server parses it on upload and rejects malformed files with a clear error.

See Database Format for the full YAML and markdown spec.

Tune LLM prompts

Every prompt the system sends is a standalone .md file in src/ai_paper_review/prompts/. Edit the file; no Python change required. Placeholders use {name} syntax (Python str.format) and are documented in each file.

Prompt file	Used by
`writing_clarity_system.md`	Always-on `G001` writing-clarity reviewer.
`human_review_extraction_system.md`	Validation Stage 1 — reshape raw human-review text into AI-review markdown.
`markdown_repair_system.md` + `markdown_repair_user.md`	Repair retry when a reviewer's (or the clarity reviewer's) first LLM output fails to parse.
`batch_alignment_system.md` + `batch_alignment_user.md`	Validation Stage 3 — the single batch-similarity LLM call that produces the N × M matrix.
`database_generation.md`	LLM prompt for generating a new reviewer-database YAML config for any field. Replace `[FIELD NAME]` and paste into any LLM.

The persona reviewers' system prompts live inside the reviewer-database .md (one per #### R### block), not in prompts/ — that way a new reviewer database can ship an entirely different set of persona voices.

Swap or add an LLM provider

All supported providers share a one-method protocol — complete(system, user, max_tokens) → str. The contract is in llm/clients/base.py; each existing provider is one file in llm/clients/ with lazy SDK import.

To add a provider: drop a new llm/clients/<name>.py implementing the protocol, register it in the _PROVIDER_CLASS dict in llm/factory.py, and (optionally) add env-var fallback entries in llm/config.py's _ENV_FALLBACK / _DEFAULT_BASE_URLS. Add the provider's name to SUPPORTED_PROVIDERS in the same file. Once registered, it's selectable from config.yaml like any other provider — the rest of the pipeline is provider-agnostic.

Troubleshooting

"No API key found for provider ..." — Either add it to config.yaml under api_keys.<provider>, or export the matching env var (ANTHROPIC_API_KEY, OPENAI_API_KEY, GEMINI_API_KEY, XAI_API_KEY). The provider shown on the home page is the active one — switch providers in the picker before uploading.

Web UI home page shows all providers red — config.yaml has no keys and no matching env vars are exported. Fix one, restart the server.

Review takes >10 minutes — Reviewers dispatch with no delay by default between each (free-tier-safe default). If you have a paid plan, set request_delay: 0 in config.yaml for faster runs. If you're still hitting rate limits on a free tier, raise retry_base_delay to 90–120 seconds.

Clustering merges issues that should stay separate — Raise CLUSTER_THRESHOLD (default 0.55) with the env var. 0.65 is a reasonable stricter setting.

Selector keeps missing a persona you need — Raise domain_bleed above 0.15, or edit that persona's keywords in your reviewer-database config (see Database Format) and re-upload the rebuilt .md via the web UI.

sentence-transformers download fails in a sandbox — The code auto-falls back to TF-IDF and logs a warning. Quality is slightly lower but functional.

Repo layout

ai-paper-review/
├── README.md
├── pyproject.toml                       # declares CLI entry points + deps
├── environment.yml                      # conda env (conda for python+gh, pip -e . for the rest)
├── config.example.yaml                  # copy to config.yaml
│
├── docs/
│   ├── llm_providers.md                 # LLM setup detail
│   ├── database_format.md               # reviewer-database YAML/markdown formats
│   ├── review_pipeline.md               # review pipeline — stages, inputs, outputs, diagram
│   ├── review_output_format.md          # per-review markdown schema
│   ├── validation_pipeline.md           # validation pipeline — stages, inputs, outputs, diagram
│   ├── validation_output_format.md      # validation stage output & calibration_delta schema
│   └── aggregation.md                   # cross-paper aggregation of calibration deltas (post-pipeline reporter)
│
├── src/ai_paper_review/
│   ├── __init__.py                      # ``default_db_path``; package __init__s expose nothing else
│   ├── provenance.py                    # run-ID generation + provenance banner writer
│   │
│   ├── llm/                             # provider-agnostic LLM wrapper
│   │   ├── __init__.py
│   │   ├── __main__.py                  # `python -m ai_paper_review.llm` → resolved-config dump
│   │   ├── config.py                    # ``LLMConfig`` + ``load_config`` (YAML + env overrides)
│   │   ├── factory.py                   # ``make_client`` (config → ready LLMClient)
│   │   ├── retrying.py                  # ``RetryClient`` (rate-limit backoff)
│   │   ├── probing.py                   # ``probe_providers``, ``describe_config`` (UI helpers)
│   │   ├── utils.py                     # ``env_vars_for``, ``is_local_provider``
│   │   └── clients/                     # one file per provider, lazy SDK import
│   │       ├── base.py                  # ``LLMClient`` Protocol
│   │       ├── anthropic.py             # anthropic_api
│   │       ├── openai.py                # openai_api, also serves github_api / openai_compatible_api
│   │       ├── google.py                # google_api
│   │       ├── xai.py                   # xai_api (Responses API + /v1/files for PDFs)
│   │       ├── claude.py                # claude_sdk (Claude Code CLI)
│   │       └── copilot.py               # copilot_sdk (local async session)
│   │
│   ├── review/                          # review pipeline (`ai-paper-review-review`)
│   │   ├── __init__.py
│   │   ├── review.py                    # ``ReviewState``, LangGraph wiring + CLI ``main()``
│   │   ├── reviewer_db.py               # ``Reviewer`` dataclass + DB parser
│   │   ├── pdf_ingestion.py             # PDF text extraction (pypdf / MarkItDown)
│   │   ├── selection.py                 # Embedder + persona-diversified top-N picker
│   │   ├── reviewer_dispatching.py      # parallel LLM dispatch + retries
│   │   ├── clarity.py                   # always-on writing-clarity reviewer (G001)
│   │   ├── parsing.py                   # markdown ↔ dict round-trippers
│   │   ├── clustering.py                # cross-reviewer comment clustering
│   │   ├── ranking.py                   # cluster ranking + report formatter
│   │   └── constants.py                 # N range, severity weights, retry caps
│   │
│   ├── validation/                      # validation pipeline (`ai-paper-review-validate`)
│   │   ├── __init__.py
│   │   ├── validation.py                # CLI ``main()`` — orchestrates all stages below
│   │   ├── conversion.py                # reshape raw human reviews into AI-review markdown
│   │   ├── loading.py                   # flatten human + AI markdown files into comment lists
│   │   ├── alignment.py                 # batch LLM similarity matrix + diagnostic artifact writer
│   │   ├── metrics.py                   # precision / recall / F1
│   │   ├── calibration.py               # per-paper calibration delta builder
│   │   ├── reporting.py                 # markdown validation report
│   │   ├── routing.py                   # category / sub-rating → persona (from DB attribution tables)
│   │   └── constants.py                 # recommendation / severity vocabularies + batch-similarity thresholds
│   │
│   ├── aggregation/                     # cross-paper aggregation (`ai-paper-review-aggregate`)
│   │   ├── __init__.py
│   │   └── aggregation.py               # aggregate N calibration_delta.json files into tuning recommendations
│   │
│   ├── prompts/                         # externalized LLM prompts, one .md per prompt
│   │   ├── __init__.py                  # ``prompts.load(name, **kwargs)`` helper
│   │   ├── shared_reviewer_system.md    # LLM ``system`` arg shared across all N persona reviewers + clarity
│   │   │                                  (identical across calls → provider prompt cache reuses the
│   │   │                                   (system + PDF) prefix across all parallel reviewer calls)
│   │   ├── writing_clarity_system.md    # clarity reviewer's role/scope, loaded into the user message
│   │   ├── human_review_extraction_system.md  # convert raw human review text → AI-review markdown
│   │   ├── markdown_repair_system.md    # fix malformed AI-review markdown
│   │   ├── markdown_repair_user.md
│   │   ├── batch_alignment_system.md    # batch similarity matrix prompt (validation stage)
│   │   ├── batch_alignment_user.md
│   │   └── database_generation.md       # LLM prompt for generating a new reviewer-database cfg YAML
│   │
│   ├── database/                        # bundled databases + generation CLI
│   │   ├── generation.py                # ``ai-paper-review-generate-db`` — YAML config → reviewer DB markdown
│   │   ├── comparch_reviewer_cfg.yaml   # YAML source — Computer Architecture (bundled default)
│   │   ├── comparch_reviewer_db.md      # 200 reviewer prompts — Computer Architecture (bundled default)
│   │   ├── mlai_reviewer_cfg.yaml       # YAML source — Machine Learning & AI (bundled default)
│   │   └── mlai_reviewer_db.md          # 200 reviewer prompts — Machine Learning & AI (bundled default)
│   │
│   └── web/                             # Flask UI (`ai-paper-review-web`), one module per route group
│       ├── __init__.py
│       ├── app.py                       # Flask ``app`` instance, paths, context processor, ``main()``
│       ├── jobs.py                      # in-memory JOBS / VALIDATE_JOBS state, rehydrate, run-id helpers
│       ├── review.py                    # /review routes + review-pipeline worker thread
│       ├── validation.py                # /validation routes + validation-pipeline worker thread
│       ├── aggregation.py               # /aggregation page (cross-paper aggregation surface)
│       ├── databases.py                 # /database routes (list / upload / view / delete)
│       ├── model.py                     # /model page (provider availability + session overrides)
│       ├── docs.py                      # /docs browser (markdown rendering of docs/)
│       ├── run_files.py                 # enumerate artifacts in a run directory for the result page
│       ├── templates/                   # Jinja2 HTML templates, one per page
│       └── static/style.css
│
└── tests/
    ├── conftest.py                      # shared fixtures (mock LLM client, tmp paths)
    ├── fixtures/                        # sample actual.md + ai.md for validation tests
    ├── test_llm.py                      # LLM config loading + provider probing
    ├── test_convert.py                  # human-review extraction + markdown repair
    ├── test_validate.py                 # alignment, metrics, calibration, reporting
    ├── test_provenance.py               # provenance banner generation
    └── test_web.py                      # Flask route smoke tests

Each pipeline package's __init__.py is intentionally empty — every name is reached via its explicit submodule path (e.g. from ai_paper_review.review.reviewer_db import Reviewer). LLM prompts live in prompts/ so editing them is a single .md change with no Python touched.

Runtime dirs (auto-created, git-ignored): ai-paper-review-data/{uploads,runs,databases}/.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.0

May 3, 2026

0.4.0

Apr 27, 2026

0.3.2

Apr 27, 2026

0.3.1

Apr 24, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_paper_review-0.5.0.tar.gz (373.1 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_paper_review-0.5.0-py3-none-any.whl (348.8 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file ai_paper_review-0.5.0.tar.gz.

File metadata

Download URL: ai_paper_review-0.5.0.tar.gz
Upload date: May 3, 2026
Size: 373.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_paper_review-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`e12421d1aa8b88b7c2eca806d86d2f2177427fddf22a5469afae96b1ec063629`
MD5	`57375a3b38a76117b9150a05541ed409`
BLAKE2b-256	`b0897780e0434ab2f097986336224d42f8136101a1688245bcf6883df29a2217`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_paper_review-0.5.0.tar.gz:

Publisher: publish.yml on UnaryLab/ai-paper-review

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_paper_review-0.5.0.tar.gz
- Subject digest: e12421d1aa8b88b7c2eca806d86d2f2177427fddf22a5469afae96b1ec063629
- Sigstore transparency entry: 1429958960
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: UnaryLab/ai-paper-review@0de5472a971a812c3ab8b2de717b452c0b18f05e
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/UnaryLab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0de5472a971a812c3ab8b2de717b452c0b18f05e
- Trigger Event: push

File details

Details for the file ai_paper_review-0.5.0-py3-none-any.whl.

File metadata

Download URL: ai_paper_review-0.5.0-py3-none-any.whl
Upload date: May 3, 2026
Size: 348.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ai_paper_review-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`128935968ea3be22ca0da26a7539f669335083e5b6bf3f2e98ae095338311882`
MD5	`5cae88803c6837fdb35d7e58e89a21b6`
BLAKE2b-256	`cbeabbe66e4b1af26b4a7e0a1575947edf4da2fd19f8101ba9b3fca3f39684fb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_paper_review-0.5.0-py3-none-any.whl:

Publisher: publish.yml on UnaryLab/ai-paper-review

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_paper_review-0.5.0-py3-none-any.whl
- Subject digest: 128935968ea3be22ca0da26a7539f669335083e5b6bf3f2e98ae095338311882
- Sigstore transparency entry: 1429958962
- Sigstore integration time: May 3, 2026
Source repository:
- Permalink: UnaryLab/ai-paper-review@0de5472a971a812c3ab8b2de717b452c0b18f05e
- Branch / Tag: refs/tags/v0.5.0
- Owner: https://github.com/UnaryLab
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@0de5472a971a812c3ab8b2de717b452c0b18f05e
- Trigger Event: push

ai-paper-review 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AI Paper Review

⚠️ Intended use — please read

Citation

Quick start

Install

Configure your LLM

Supported providers

Using the web UI

Model — set your LLM provider

Review — review a paper

Validation — compare AI vs human reviews

Aggregation — cross-paper tuning recommendations

Database — browse / upload reviewer databases

Using the CLI

Review a paper — ai-paper-review-review

Validate AI vs human review — ai-paper-review-validate

Cross-paper aggregation — ai-paper-review-aggregate

How it works

Customization

Tuning knobs

Bring your own reviewer database

Tune LLM prompts

Swap or add an LLM provider

Troubleshooting

Repo layout

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Review a paper — `ai-paper-review-review`

Validate AI vs human review — `ai-paper-review-validate`

Cross-paper aggregation — `ai-paper-review-aggregate`