A CLI research agent for scholarly paper search, evidence synthesis, code discovery, PDF collection, bilingual reports, and Obsidian Wiki.
Project description
PaperPilot
PaperPilot is a CLI research agent for AI-related literature review.
It turns one user request into a traceable, evidence-based research workflow and generates bilingual reports (zh/en) in Markdown, HTML, and PDF.
✨ What PaperPilot does
PaperPilot is not a chatbot. It is an interactive scientific workflow:
- Parse natural-language research requests
- Build an explicit search protocol with inclusion/exclusion rules
- Query multi-source literature APIs
- Normalize, deduplicate, and screen papers
- Verify URLs/PDF/code availability
- Synthesize evidence and generate review reports
- Output structured artifacts for reproducibility
Each run creates a dedicated folder under runs/ with full state, logs, and intermediate files.
🚀 Highlights
Core experience
- Natural-language intake with LLM-assisted interpretation
- Interactive shell with:
/modelto manage LLM profiles/sourcesto inspect search source/API status/doctorfor quick self-checks
- Multi-source retrieval with source registry and diagnostics
- Resume/inspect modes for reproducible research sessions
Retrieval and screening
- Protocol-aware search using plan + diversified keywords
- Canonicalized
Paperschema and robust deduplication - Core/adjacent/excluded paper classification
- PDF + code-link verification (no paywall bypass)
- Optional full-text extraction from downloadable PDFs
Reporting
- Canonical bilingual report model
- Consistent
[1][2][3]citation mapping - Method taxonomy and evidence matrix
- Markdown + HTML + PDF outputs with aligned content
- Formal reports contain at least 30 papers when enough screened evidence exists
- Obsidian Wiki export with paper, method, topic, and claim notes
Quality controls
- Quality gates and reflection workflow
- Evidence ledger linking claims to corpus evidence
- Review checks for citation compliance and source reliability
- Event stream logs for auditability
🗂 Source stack
Default free sources:
- arXiv
- Semantic Scholar
- OpenAlex
- Crossref
- OpenReview
- PubMed / NCBI E-utilities
- Europe PMC
- bioRxiv / medRxiv
- DBLP
- ACL Anthology
- Papers.cool
Optional API-key sources:
- DeepXiv / Agentic Data
- CORE
- Lens.org Scholarly API
- IEEE Xplore
- Springer Nature
- Elsevier / Scopus
- Dimensions
🛠 Installation
python -m pip install paperpilot -i https://pypi.org/simple
Local development:
git clone https://github.com/CHB-learner/PaperPilot.git
cd PaperPilot
python -m pip install -e .
⚙️ LLM + Source Configuration
PaperPilot requires OpenAI-compatible LLM settings for query understanding, planning, synthesis, and report generation.
On first run, it creates an editable configuration template at:
~/.paperpilot/config.json
Minimal default template:
{
"active": "default",
"profiles": {
"default": {
"api_key": "",
"base_url": "",
"model": "gpt-5.2"
}
},
"sources": {
"core": {"enabled": null, "api_key": "", "base_url": ""},
"lens": {"enabled": null, "api_key": "", "base_url": ""},
"ieee": {"enabled": null, "api_key": "", "base_url": ""},
"springer": {"enabled": null, "api_key": "", "base_url": ""},
"elsevier": {"enabled": null, "api_key": "", "base_url": ""},
"dimensions": {"enabled": null, "api_key": "", "base_url": ""},
"deepxiv": {"enabled": null, "api_key": "", "base_url": ""}
}
}
Notes:
- Leave optional source API keys empty if unavailable.
enabled: nullmeans auto-enable once a valid key is provided.~/.paperpilot/config.jsonis not committed; edit it directly or use CLI commands.
CLI config commands
PaperPilot config set --base-url https://api.deepseek.com --model deepseek-chat
PaperPilot config import ./api.json
PaperPilot config list
PaperPilot config use deepseek
PaperPilot config show
PaperPilot --doctor
PaperPilot sources list
PaperPilot sources config core
PaperPilot sources config deepxiv
PaperPilot sources enable core
PaperPilot sources test core
Inside interactive mode, use /sources and /doctor.
🔑 API source keys references
| Source | Access page |
|---|---|
| CORE | https://core.ac.uk/services/api |
| Lens.org | https://docs.api.lens.org/ |
| IEEE Xplore | https://developer.ieee.org/getting_started |
| Springer Nature | https://dev.springernature.com/ |
| Elsevier / Scopus | https://dev.elsevier.com/ |
| Dimensions | https://docs.dimensions.ai/dsl/api.html |
| DeepXiv / Agentic Data | https://data.rag.ac.cn/api/docs |
| Papers.cool | https://papers.cool |
🧪 Quick Start
Interactive usage:
PaperPilot
Command mode example:
PaperPilot "RNA inverse folding sequence design" \
--auto-confirm \
--max-papers 50 \
--since-year 2021 \
--github-filter required \
--sources auto \
--mode apa \
--quality balanced
Import local corpus and skip download:
PaperPilot "RNA inverse folding sequence design" \
--auto-confirm \
--user-corpus ./papers \
--user-corpus references.bib \
--no-download
Inspect/resume workflow:
PaperPilot inspect runs/<task-id>
PaperPilot resume runs/<task-id>
🧭 Workflow
PaperPilot follows this state-machine pipeline:
Intake -> Protocol -> Search -> Corpus -> Screening -> Verification -> Synthesis -> Review -> Report
flowchart LR
U[User request] --> C[Run context]
C --> QA[Query understanding]
QA --> PL[Planning + Protocol]
PL --> ST[Source Registry search]
ST --> NB[Corpus normalization]
NB --> SC[Core/adjacent screening]
SC --> VF[Verification + PDF + code checks]
VF --> SY[Literature matrix]
SY --> QG[Quality gate + reflection]
QG --> EL[Evidence ledger]
EL --> RP[Report render (ZH/EN)]
📁 Run artifacts
runs/<task-id>/ will contain:
task.json/state.json/events.jsonl/manifest.jsonquery_understanding.md/plan.json/protocol.jsonmetadata.json/corpus.json/core_papers.jsonadjacent_papers.json/excluded_papers.json/ranked_papers.jsonverification.json/download_log.json/fulltext//paper_notes.jsonliterature_matrix.json/synthesis.json/quality_gate.jsonevidence_ledger.json/review_agent_findings.jsonreport.canonical.json/report.zh.md/report.en.mdreport.zh.html/report.en.html/report.zh.pdf/report.en.pdfreport_selection.json/shortfall.jsonwhen the 30-paper minimum cannot be metobsidian_wiki/withindex.md, paper notes, method notes, topic notes, claim notes, and wiki lint metadatapdfs//source_diagnostics.json/registries.json/prompt_manifest.json
🧠 Obsidian Wiki
Each successful run generates runs/<task-id>/obsidian_wiki/ by default. Open that folder as an Obsidian vault to browse:
index.md: research entry point and the 30-paper overviewpapers/: one note per reported paper with citation label, PDF/code links, method family, and evidence basismethods/: method-family notes linked to representative paperstopics/: query/subtopic notesclaims/: evidence-map claim notes_meta/manifest.jsonand_meta/wiki_lint.json: provenance, hashes, broken-link checks
Use --no-obsidian-wiki to skip Wiki generation.
🧩 Code filter modes
any: keep all papers and annotate code availabilityrequired: keep only papers with detected code repositories in final viewnone: keep only papers without detected public code links
🧪 CLI options (important ones)
--max-papers INT maximum papers in final report view; must be >= 30
--min-report-papers INT minimum papers required in formal reports; default/minimum: 30
--since-year INT preferred lower year bound
--github-filter any|required|none
--github-search-limit INT
--no-download skip PDF downloads
--pdf-limit INT maximum PDFs to download
--user-corpus PATH repeatable local corpus path
--mode quick|apa|systematic
--interaction auto|gated
--quality fast|balanced|strict
--include-adjacent include adjacent papers in appendices
--sources auto|all|core|biomed|cs|configured
--enable-source SOURCE enable one source (repeatable)
--disable-source SOURCE disable one source (repeatable)
--no-obsidian-wiki skip Obsidian Wiki export
See paperpilot --help for full options and Chinese/English output.
🧱 Development notes
- Keep run outputs and generated artifacts out of source control.
- Keep API keys out of git history.
- Prefer
.gitignoreover manual cleanup. - Use semantic tags for releases and keep
README+ docs aligned. - Keep
.github/workflows/*,RELEASING.md,CHANGELOG.mdin sync when publishing.
🧭 Open source checklist
- Ensure
~/.paperpilot/config.json,api.json, and.envwith credentials are never committed. - Add/keep
LICENSEand.gitignore. - Add source code and tags before publishing release assets.
- Publish GitHub Pages from
docs/. - Keep versions in
pyproject.toml,literature_agent/__init__.py, and generated manifests aligned.
One-command release
# dry-run checks only
./scripts/release_everywhere.sh --dry-run
# normal release (pushed commit + tag + GH release + PyPI)
export PYPI_TOKEN='pypi-...'
./scripts/release_everywhere.sh
# release without publishing to PyPI
./scripts/release_everywhere.sh --no-pypi
Suggested publish flow (full):
python -m unittest discover -s tests
python -m compileall literature_agent
./publish_pypi.sh --dry-run --version <VERSION>
git add -A
git commit -m "chore: release v<VERSION>"
git tag -a v<VERSION> -m "v<VERSION>"
git push origin main --tags
./publish_pypi.sh --version <VERSION>
For GitHub Pages: enable Pages to deploy from main + /docs, or rely on .github/workflows/gh-pages.yml.
📚 Citation note
If you use PaperPilot in your work, include the repository URL and version used so results are reproducible.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file paperpilot-1.5.1.tar.gz.
File metadata
- Download URL: paperpilot-1.5.1.tar.gz
- Upload date:
- Size: 109.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
651f0dce6af5f37c6320c5809dd42f9b9d4da45f566e687f9a586f013cf29ec0
|
|
| MD5 |
1c11bd1b4209f806940f5d53fcbdb27b
|
|
| BLAKE2b-256 |
7802683869367d0457d82f6bf811c2849bfd5dacd5432a75daac1fdcbf66847b
|
File details
Details for the file paperpilot-1.5.1-py3-none-any.whl.
File metadata
- Download URL: paperpilot-1.5.1-py3-none-any.whl
- Upload date:
- Size: 106.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba4fdf5f0fb7fb8f6b92fbfad42a3a7274dfacb58beede29b67d2dfc28f8585a
|
|
| MD5 |
8d448204c59558f538dfd5a1a5c902df
|
|
| BLAKE2b-256 |
09c22ec43d4bd468883ba288439123db1cd5ef4d0e0b8555eded1401c5c10419
|