Git-based AI research assistant with literature analysis tools
Project description
Researchbot
A Git-based AI research assistant that turns your local file system into an active research lab.
You brain dump ideas, paper references, and notes into a workspace file. Then you tell the agent what to do — survey the landscape, check novelty, deep-read specific papers — and it orchestrates the work: searching APIs, analyzing papers in parallel, and writing structured output to your project files.
Built on Claude Code with custom MCP servers, using VS Code/Cursor as the interface and Markdown as the document format.
Installation
From PyPI
pip install researchbot
From source
git clone https://github.com/juankost/researchbot.git
cd researchbot
uv sync
Configuration
Researchbot looks for API keys in this order:
- Shell environment variables (highest priority)
~/.config/researchbot/.env(user-level config)<project_root>/.env(local dev fallback)
Set up your API keys:
mkdir -p ~/.config/researchbot
cat > ~/.config/researchbot/.env << 'EOF'
# Semantic Scholar (optional — increases rate limits)
SEMANTIC_SCHOLAR_API_KEY=
# Mistral OCR (required if using mistral as OCR provider)
MISTRAL_API_KEY=
# Deepseek OCR (required if using deepseek as OCR provider)
DEEPSEEK_API_KEY=
# OCR provider: "mistral" (default) or "deepseek"
RESEARCHBOT_OCR_PROVIDER=mistral
EOF
Claude Code Integration
To use researchbot as an MCP server in Claude Code globally, see claude-config for the install script that sets up skills and MCP tools.
Current Status
v1 — In Development
The project is in early development. The Semantic Scholar API client, PDF download + OCR pipeline, CLI commands, and MCP server are implemented. The agent skills are being built next.
What's implemented
- Python package structure (
researchbot/) with module stubs - Dependencies:
httpx,click,mistralai,openai,mcp,python-dotenv - Configuration with OCR provider toggle (Mistral default, Deepseek configurable), two-tier
.envloading - Semantic Scholar API client (
researchbot/scholar.py) —Paperdataclass,SemanticScholarClientclass with search, get_paper, citations, references, and similar paper methods. Includesresolve_paper()for flexible input: accepts IDs, URLs, or paper names. - PDF download + OCR pipeline (
researchbot/pdf.py,researchbot/ocr.py) — resolve PDF URLs via Semantic Scholar, download with caching, Mistral OCR with image extraction, per-paper cache (text.md+images/) - CLI commands —
search,paper,citations,references,similar,readwith JSON output,--prettyflag, and--include-imagesfor thereadcommand - MCP server (
researchbot/mcp_server.py) — FastMCP server exposing 7 tools (search_papers,get_paper,get_citations,get_references,search_similar,read_paper,ocr_local_pdf) over stdio transport, auto-started by Claude Code via.mcp.json - Skills:
/analyze— In-depth structured analysis of a single paper (contribution, methodology, results, limitations, future work)/compare— Compare two papers on problem formulation and methodology/expand— Find papers solving the same problem as a seed paper, with parallel subagent analysis/gaps— Identify research gaps and open questions from a related works analysis/verify_gaps— Verify which gaps are genuinely open by searching for papers that address them/paper_review— Constructive conference-style review with literature verification via parallel subagents
What's next
- v1 skills complete. See docs/Vision.md for future directions.
See docs/plan.md for the full implementation plan.
Usage
CLI
# Search for papers
researchbot search "state space models" --limit 5 --pretty
# Get details for a specific paper
researchbot paper "ARXIV:1706.03762" --pretty
# Get papers that cite a paper
researchbot citations "ARXIV:1706.03762" --limit 5 --pretty
# Get papers referenced by a paper
researchbot references "ARXIV:1706.03762" --limit 5 --pretty
# Find similar papers
researchbot similar "ARXIV:1706.03762" --limit 5 --pretty
# Download + OCR a paper (markdown text output)
researchbot read "ARXIV:2106.15928"
# Download + OCR with image paths (JSON output)
researchbot read "ARXIV:2106.15928" --include-images --pretty
All commands output JSON by default. Add --pretty for indented output and --limit N to control result count.
Supported paper ID formats: ARXIV:xxx, DOI:xxx, CorpusId:xxx, Semantic Scholar hash, or URL:xxx.
The read command (and the MCP read_paper tool) also accepts paper names — they are resolved via Semantic Scholar search:
researchbot read "Attention is All You Need"
Skills (Claude Code slash commands)
Skills are invoked as slash commands inside Claude Code.
# Deep-read a paper by arXiv ID
/analyze ARXIV:1706.03762
# Deep-read by paper name
/analyze Attention is All You Need
# Deep-read by URL
/analyze https://arxiv.org/abs/2106.15928
# Compare two papers
/compare Mamba S4
# Compare papers by arXiv ID
/compare ARXIV:2312.00752 ARXIV:2111.00396
# Find related works for a seed paper
/expand Mamba
# Expand from a specific paper
/expand ARXIV:2312.00752
# Identify gaps from a related works analysis
/gaps workspace/efficient-sequence-modeling/
# Verify which gaps are genuinely open
/verify_gaps workspace/efficient-sequence-modeling/
# Review a paper (published)
/paper_review ARXIV:2312.00752
# Review a local PDF draft
/paper_review ~/drafts/my-paper.pdf
# Review with specific focus
/paper_review ARXIV:2312.00752 Focus on the theoretical claims
Documentation
| File | Purpose |
|---|---|
| docs/plan.md | Implementation plan — remaining tasks and their specs |
| docs/next_task.md | Detailed spec for the next task to implement |
| docs/wip.md | Work-in-progress notes for the current task |
| docs/Vision.md | Long-term vision and design for the full research lab |
| docs/project_thoughts.md | Brain dumps on vision and project direction |
Project Structure
researchbot/
__init__.py
config.py # API keys, OCR provider toggle, cache paths
scholar.py # Semantic Scholar API client
pdf.py # PDF download utilities
ocr.py # OCR pipeline (Mistral / Deepseek)
cli.py # CLI entry point (Click)
mcp_server.py # MCP server for Claude Code (FastMCP, stdio)
.claude/
commands/
analyze.md # /analyze skill
compare.md # /compare skill
expand.md # /expand skill
gaps.md # /gaps skill
verify_gaps.md # /verify_gaps skill
paper_review.md # /paper_review skill
.mcp.json # MCP server config (auto-starts researchbot server)
docs/
plan.md # Implementation plan
next_task.md # Next task spec
wip.md # Work in progress
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file researchbot-0.1.1.tar.gz.
File metadata
- Download URL: researchbot-0.1.1.tar.gz
- Upload date:
- Size: 65.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef24beb19c599a17cf91b781f63d995c48c3443da4d5a269008e2c397d4eeca1
|
|
| MD5 |
3e65817969a2209875d694eb86ea8ac4
|
|
| BLAKE2b-256 |
f7cba9fbbd6b660589984130d27838b2808efb5b4af3739385f04fae652f713e
|
File details
Details for the file researchbot-0.1.1-py3-none-any.whl.
File metadata
- Download URL: researchbot-0.1.1-py3-none-any.whl
- Upload date:
- Size: 16.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6843becc14e554c04cb9484a96e1b408004a37a8bbff7876f615470f8721fd6d
|
|
| MD5 |
f67193ea5ececab468b1ddb9cccdc6c8
|
|
| BLAKE2b-256 |
867ce255c952697603c278e397c23abc31773975e223a241fb5d1f79c5d9e291
|