Skip to main content

A CLI research agent for AI-related paper search, code discovery, PDF collection, and bilingual reports.

Project description

PaperPilot

PyPI Python License Release CLI Reports Workflow

English | 中文 | Website

PaperPilot - AI literature review agent

PaperPilot is a CLI research agent for AI-related literature review.
It turns one user request into a traceable, evidence-based research workflow and generates bilingual reports (zh/en) in Markdown, HTML, and PDF.

✨ What PaperPilot does

PaperPilot is not a chatbot. It is an interactive scientific workflow:

  • Parse natural-language research requests
  • Build an explicit search protocol with inclusion/exclusion rules
  • Query multi-source literature APIs
  • Normalize, deduplicate, and screen papers
  • Verify URLs/PDF/code availability
  • Synthesize evidence and generate review reports
  • Output structured artifacts for reproducibility

Each run creates a dedicated folder under runs/ with full state, logs, and intermediate files.

🚀 Highlights

Core experience

  • Natural-language intake with LLM-assisted interpretation
  • Interactive shell with:
    • /model to manage LLM profiles
    • /sources to inspect search source/API status
    • /doctor for quick self-checks
  • Multi-source retrieval with source registry and diagnostics
  • Resume/inspect modes for reproducible research sessions

Retrieval and screening

  • Protocol-aware search using plan + diversified keywords
  • Canonicalized Paper schema and robust deduplication
  • Core/adjacent/excluded paper classification
  • PDF + code-link verification (no paywall bypass)
  • Optional full-text extraction from downloadable PDFs

Reporting

  • Canonical bilingual report model
  • Consistent [1][2][3] citation mapping
  • Method taxonomy and evidence matrix
  • Markdown + HTML + PDF outputs with aligned content

Quality controls

  • Quality gates and reflection workflow
  • Evidence ledger linking claims to corpus evidence
  • Review checks for citation compliance and source reliability
  • Event stream logs for auditability

🗂 Source stack

Default free sources:

  • arXiv
  • Semantic Scholar
  • OpenAlex
  • Crossref
  • OpenReview
  • PubMed / NCBI E-utilities
  • Europe PMC
  • bioRxiv / medRxiv
  • DBLP
  • ACL Anthology
  • Papers.cool

Optional API-key sources:

  • DeepXiv / Agentic Data
  • CORE
  • Lens.org Scholarly API
  • IEEE Xplore
  • Springer Nature
  • Elsevier / Scopus
  • Dimensions

🛠 Installation

python -m pip install paperpilot -i https://pypi.org/simple

Local development:

git clone https://github.com/CHB-learner/PaperPilot.git
cd PaperPilot
python -m pip install -e .

⚙️ LLM + Source Configuration

PaperPilot requires OpenAI-compatible LLM settings for query understanding, planning, synthesis, and report generation.

On first run, it creates an editable configuration template at:

~/.paperpilot/config.json

Minimal default template:

{
  "active": "default",
  "profiles": {
    "default": {
      "api_key": "",
      "base_url": "",
      "model": "gpt-5.2"
    }
  },
  "sources": {
    "core": {"enabled": null, "api_key": "", "base_url": ""},
    "lens": {"enabled": null, "api_key": "", "base_url": ""},
    "ieee": {"enabled": null, "api_key": "", "base_url": ""},
    "springer": {"enabled": null, "api_key": "", "base_url": ""},
    "elsevier": {"enabled": null, "api_key": "", "base_url": ""},
    "dimensions": {"enabled": null, "api_key": "", "base_url": ""},
    "deepxiv": {"enabled": null, "api_key": "", "base_url": ""}
  }
}

Notes:

  • Leave optional source API keys empty if unavailable.
  • enabled: null means auto-enable once a valid key is provided.
  • ~/.paperpilot/config.json is not committed; edit it directly or use CLI commands.

CLI config commands

PaperPilot config set --base-url https://api.deepseek.com --model deepseek-chat
PaperPilot config import ./api.json
PaperPilot config list
PaperPilot config use deepseek
PaperPilot config show
PaperPilot --doctor
PaperPilot sources list
PaperPilot sources config core
PaperPilot sources config deepxiv
PaperPilot sources enable core
PaperPilot sources test core

Inside interactive mode, use /sources and /doctor.

🔑 API source keys references

Source Access page
CORE https://core.ac.uk/services/api
Lens.org https://docs.api.lens.org/
IEEE Xplore https://developer.ieee.org/getting_started
Springer Nature https://dev.springernature.com/
Elsevier / Scopus https://dev.elsevier.com/
Dimensions https://docs.dimensions.ai/dsl/api.html
DeepXiv / Agentic Data https://data.rag.ac.cn/api/docs
Papers.cool https://papers.cool

🧪 Quick Start

Interactive usage:

PaperPilot

Command mode example:

PaperPilot "RNA inverse folding sequence design" \
  --auto-confirm \
  --max-papers 50 \
  --since-year 2021 \
  --github-filter required \
  --sources auto \
  --mode apa \
  --quality balanced

Import local corpus and skip download:

PaperPilot "RNA inverse folding sequence design" \
  --auto-confirm \
  --user-corpus ./papers \
  --user-corpus references.bib \
  --no-download

Inspect/resume workflow:

PaperPilot inspect runs/<task-id>
PaperPilot resume runs/<task-id>

🧭 Workflow

PaperPilot follows this state-machine pipeline:

Intake -> Protocol -> Search -> Corpus -> Screening -> Verification -> Synthesis -> Review -> Report
flowchart LR
  U[User request] --> C[Run context]
  C --> QA[Query understanding]
  QA --> PL[Planning + Protocol]
  PL --> ST[Source Registry search]
  ST --> NB[Corpus normalization]
  NB --> SC[Core/adjacent screening]
  SC --> VF[Verification + PDF + code checks]
  VF --> SY[Literature matrix]
  SY --> QG[Quality gate + reflection]
  QG --> EL[Evidence ledger]
  EL --> RP[Report render (ZH/EN)]

📁 Run artifacts

runs/<task-id>/ will contain:

  • task.json / state.json / events.jsonl / manifest.json
  • query_understanding.md / plan.json / protocol.json
  • metadata.json / corpus.json / core_papers.json
  • adjacent_papers.json / excluded_papers.json / ranked_papers.json
  • verification.json / download_log.json / fulltext/ / paper_notes.json
  • literature_matrix.json / synthesis.json / quality_gate.json
  • evidence_ledger.json / review_agent_findings.json
  • report.canonical.json / report.zh.md / report.en.md
  • report.zh.html / report.en.html / report.zh.pdf / report.en.pdf
  • pdfs/ / source_diagnostics.json / registries.json / prompt_manifest.json

🧩 Code filter modes

  • any: keep all papers and annotate code availability
  • required: keep only papers with detected code repositories in final view
  • none: keep only papers without detected public code links

🧪 CLI options (important ones)

--max-papers INT                 maximum papers in final report view
--since-year INT                 preferred lower year bound
--github-filter any|required|none
--github-search-limit INT
--no-download                    skip PDF downloads
--pdf-limit INT                  maximum PDFs to download
--user-corpus PATH               repeatable local corpus path
--mode quick|apa|systematic
--interaction auto|gated
--quality fast|balanced|strict
--include-adjacent               include adjacent papers in appendices
--sources auto|all|core|biomed|cs|configured
--enable-source SOURCE           enable one source (repeatable)
--disable-source SOURCE          disable one source (repeatable)

See paperpilot --help for full options and Chinese/English output.

🧱 Development notes

  • Keep run outputs and generated artifacts out of source control.
  • Keep API keys out of git history.
  • Prefer .gitignore over manual cleanup.
  • Use semantic tags for releases and keep README + docs aligned.
  • Keep .github/workflows/*, RELEASING.md, CHANGELOG.md in sync when publishing.

🧭 Open source checklist

  • Ensure ~/.paperpilot/config.json, api.json, and .env with credentials are never committed.
  • Add/keep LICENSE and .gitignore.
  • Add source code and tags before publishing release assets.
  • Publish GitHub Pages from docs/.
  • Keep versions in pyproject.toml, literature_agent/__init__.py, and generated manifests aligned.

One-command release

# dry-run checks only
./scripts/release_everywhere.sh --dry-run

# normal release (pushed commit + tag + GH release + PyPI)
export PYPI_TOKEN='pypi-...'
./scripts/release_everywhere.sh

# release without publishing to PyPI
./scripts/release_everywhere.sh --no-pypi

Suggested publish flow (full):

python -m unittest discover -s tests
python -m compileall literature_agent
./publish_pypi.sh --dry-run --version <VERSION>
git add -A
git commit -m "chore: release v<VERSION>"
git tag -a v<VERSION> -m "v<VERSION>"
git push origin main --tags
./publish_pypi.sh --version <VERSION>

For GitHub Pages: enable Pages to deploy from main + /docs, or rely on .github/workflows/gh-pages.yml.

📚 Citation note

If you use PaperPilot in your work, include the repository URL and version used so results are reproducible.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paperpilot-1.4.5.tar.gz (99.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paperpilot-1.4.5-py3-none-any.whl (97.7 kB view details)

Uploaded Python 3

File details

Details for the file paperpilot-1.4.5.tar.gz.

File metadata

  • Download URL: paperpilot-1.4.5.tar.gz
  • Upload date:
  • Size: 99.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for paperpilot-1.4.5.tar.gz
Algorithm Hash digest
SHA256 4edb7746cd9a7b49ad5a8631dc4fc9c65502bf1add73c61a8e60108e8085d32c
MD5 572b2beca8e0357f9ab08f5eeb6d67ec
BLAKE2b-256 440d82a528f638383d1b86bd0e4909b5451489569f3e0beb3d882cda8cf86a6f

See more details on using hashes here.

File details

Details for the file paperpilot-1.4.5-py3-none-any.whl.

File metadata

  • Download URL: paperpilot-1.4.5-py3-none-any.whl
  • Upload date:
  • Size: 97.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for paperpilot-1.4.5-py3-none-any.whl
Algorithm Hash digest
SHA256 455a35e1e5ccb102cbc5ba5c0f6c3bbfb6600e660b9f8083b092869824c8ae70
MD5 83d0f8efac26c4018c7531936f42b801
BLAKE2b-256 0362008e356bd96ac33b6f8d97047ffbe53fb9adbde5861afa9718f3d78c2dd2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page