Skip to main content

Git-based AI research assistant with literature analysis tools

Project description

Researchbot

A Git-based AI research assistant that turns your local file system into an active research lab.

You brain dump ideas, paper references, and notes into a workspace file. Then you tell the agent what to do — survey the landscape, check novelty, deep-read specific papers — and it orchestrates the work: searching APIs, analyzing papers in parallel, and writing structured output to your project files.

Built on Claude Code with custom MCP servers, using VS Code/Cursor as the interface and Markdown as the document format.

Installation

From PyPI

pip install researchbot

From source

git clone https://github.com/juankost/researchbot.git
cd researchbot
uv sync

Configuration

Researchbot looks for API keys in this order:

  1. Shell environment variables (highest priority)
  2. ~/.config/researchbot/.env (user-level config)
  3. <project_root>/.env (local dev fallback)

Set up your API keys:

mkdir -p ~/.config/researchbot
cat > ~/.config/researchbot/.env << 'EOF'
# Semantic Scholar (optional — increases rate limits)
SEMANTIC_SCHOLAR_API_KEY=

# Mistral OCR (required if using mistral as OCR provider)
MISTRAL_API_KEY=

# Deepseek OCR (required if using deepseek as OCR provider)
DEEPSEEK_API_KEY=

# OCR provider: "mistral" (default) or "deepseek"
RESEARCHBOT_OCR_PROVIDER=mistral
EOF

Claude Code Integration

To use researchbot as an MCP server in Claude Code globally, see claude-config for the install script that sets up skills and MCP tools.

Current Status

v1 — In Development

The project is in early development. The Semantic Scholar API client, PDF download + OCR pipeline, CLI commands, and MCP server are implemented. The agent skills are being built next.

What's implemented

  • Python package structure (researchbot/) with module stubs
  • Dependencies: httpx, click, mistralai, openai, mcp, python-dotenv
  • Configuration with OCR provider toggle (Mistral default, Deepseek configurable), two-tier .env loading
  • Semantic Scholar API client (researchbot/scholar.py) — Paper dataclass, SemanticScholarClient class with search, get_paper, citations, references, and similar paper methods. Includes resolve_paper() for flexible input: accepts IDs, URLs, or paper names.
  • PDF download + OCR pipeline (researchbot/pdf.py, researchbot/ocr.py) — resolve PDF URLs via Semantic Scholar, download with caching, Mistral OCR with image extraction, per-paper cache (text.md + images/)
  • CLI commandssearch, paper, citations, references, similar, read with JSON output, --pretty flag, and --include-images for the read command
  • MCP server (researchbot/mcp_server.py) — FastMCP server exposing 7 tools (search_papers, get_paper, get_citations, get_references, search_similar, read_paper, ocr_local_pdf) over stdio transport, auto-started by Claude Code via .mcp.json
  • Skills:
    • /analyze — In-depth structured analysis of a single paper (contribution, methodology, results, limitations, future work)
    • /compare — Compare two papers on problem formulation and methodology
    • /expand — Find papers solving the same problem as a seed paper, with parallel subagent analysis
    • /gaps — Identify research gaps and open questions from a related works analysis
    • /verify_gaps — Verify which gaps are genuinely open by searching for papers that address them
    • /paper_review — Constructive conference-style review with literature verification via parallel subagents

What's next

See docs/plan.md for the full implementation plan.

Usage

CLI

# Search for papers
researchbot search "state space models" --limit 5 --pretty

# Get details for a specific paper
researchbot paper "ARXIV:1706.03762" --pretty

# Get papers that cite a paper
researchbot citations "ARXIV:1706.03762" --limit 5 --pretty

# Get papers referenced by a paper
researchbot references "ARXIV:1706.03762" --limit 5 --pretty

# Find similar papers
researchbot similar "ARXIV:1706.03762" --limit 5 --pretty

# Download + OCR a paper (markdown text output)
researchbot read "ARXIV:2106.15928"

# Download + OCR with image paths (JSON output)
researchbot read "ARXIV:2106.15928" --include-images --pretty

All commands output JSON by default. Add --pretty for indented output and --limit N to control result count.

Supported paper ID formats: ARXIV:xxx, DOI:xxx, CorpusId:xxx, Semantic Scholar hash, or URL:xxx.

The read command (and the MCP read_paper tool) also accepts paper names — they are resolved via Semantic Scholar search:

researchbot read "Attention is All You Need"

Skills (Claude Code slash commands)

Skills are invoked as slash commands inside Claude Code.

# Deep-read a paper by arXiv ID
/analyze ARXIV:1706.03762

# Deep-read by paper name
/analyze Attention is All You Need

# Deep-read by URL
/analyze https://arxiv.org/abs/2106.15928

# Compare two papers
/compare Mamba S4

# Compare papers by arXiv ID
/compare ARXIV:2312.00752 ARXIV:2111.00396

# Find related works for a seed paper
/expand Mamba

# Expand from a specific paper
/expand ARXIV:2312.00752

# Identify gaps from a related works analysis
/gaps workspace/efficient-sequence-modeling/

# Verify which gaps are genuinely open
/verify_gaps workspace/efficient-sequence-modeling/

# Review a paper (published)
/paper_review ARXIV:2312.00752

# Review a local PDF draft
/paper_review ~/drafts/my-paper.pdf

# Review with specific focus
/paper_review ARXIV:2312.00752 Focus on the theoretical claims

Documentation

File Purpose
docs/plan.md Implementation plan — remaining tasks and their specs
docs/next_task.md Detailed spec for the next task to implement
docs/wip.md Work-in-progress notes for the current task
docs/Vision.md Long-term vision and design for the full research lab
docs/project_thoughts.md Brain dumps on vision and project direction

Project Structure

researchbot/
  __init__.py
  config.py       # API keys, OCR provider toggle, cache paths
  scholar.py      # Semantic Scholar API client
  pdf.py          # PDF download utilities
  ocr.py          # OCR pipeline (Mistral / Deepseek)
  cli.py          # CLI entry point (Click)
  mcp_server.py   # MCP server for Claude Code (FastMCP, stdio)
.claude/
  commands/
    analyze.md      # /analyze skill
    compare.md      # /compare skill
    expand.md       # /expand skill
    gaps.md         # /gaps skill
    verify_gaps.md  # /verify_gaps skill
    paper_review.md # /paper_review skill
.mcp.json         # MCP server config (auto-starts researchbot server)
docs/
  plan.md         # Implementation plan
  next_task.md    # Next task spec
  wip.md          # Work in progress

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

researchbot-0.1.0.tar.gz (83.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

researchbot-0.1.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file researchbot-0.1.0.tar.gz.

File metadata

  • Download URL: researchbot-0.1.0.tar.gz
  • Upload date:
  • Size: 83.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for researchbot-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8665163fb91d57c97411908cbbe7236552602f25cc430f60c3601ec6c8e8256e
MD5 405a08ab2bc0514f65e2d9b0eb2497c5
BLAKE2b-256 c3403661a82f8ab3a8bdce582c0e376797382e292b51684632ba8504a9ba28d3

See more details on using hashes here.

File details

Details for the file researchbot-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: researchbot-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for researchbot-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d0d13fd2b63400ce1eec54811b99ee6c8bc59eb6139e6f6783dfe0f7eefedb15
MD5 ee8310e4f873f7d38516bba343f45aa0
BLAKE2b-256 421b0a588be0b8f28e6a42a78745adecfe361dc16bedccf05fe1e89632aa3de9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page