Skip to main content

Git-based AI research assistant with literature analysis tools

Project description

Researchbot

A Git-based AI research assistant that turns your local file system into an active research lab.

You brain dump ideas, paper references, and notes into a workspace file. Then you tell the agent what to do — survey the landscape, check novelty, deep-read specific papers — and it orchestrates the work: searching APIs, analyzing papers in parallel, and writing structured output to your project files.

Built on Claude Code with custom MCP servers, using VS Code/Cursor as the interface and Markdown as the document format.

Installation

From PyPI

pip install researchbot

From source

git clone https://github.com/juankost/researchbot.git
cd researchbot
uv sync

Configuration

Researchbot looks for API keys in this order:

  1. Shell environment variables (highest priority)
  2. ~/.config/researchbot/.env (user-level config)
  3. <project_root>/.env (local dev fallback)

Set up your API keys:

mkdir -p ~/.config/researchbot
cat > ~/.config/researchbot/.env << 'EOF'
# Semantic Scholar (optional — increases rate limits)
SEMANTIC_SCHOLAR_API_KEY=

# Mistral OCR (required if using mistral as OCR provider)
MISTRAL_API_KEY=

# Deepseek OCR (required if using deepseek as OCR provider)
DEEPSEEK_API_KEY=

# OCR provider: "mistral" (default) or "deepseek"
RESEARCHBOT_OCR_PROVIDER=mistral
EOF

Claude Code Integration

To use researchbot as an MCP server in Claude Code globally, see claude-config for the install script that sets up skills and MCP tools.

Current Status

v1 — In Development

The project is in early development. The Semantic Scholar API client, PDF download + OCR pipeline, CLI commands, and MCP server are implemented. The agent skills are being built next.

What's implemented

  • Python package structure (researchbot/) with module stubs
  • Dependencies: httpx, click, mistralai, openai, mcp, python-dotenv
  • Configuration with OCR provider toggle (Mistral default, Deepseek configurable), two-tier .env loading
  • Semantic Scholar API client (researchbot/scholar.py) — Paper dataclass, SemanticScholarClient class with search, get_paper, citations, references, and similar paper methods. Includes resolve_paper() for flexible input: accepts IDs, URLs, or paper names.
  • PDF download + OCR pipeline (researchbot/pdf.py, researchbot/ocr.py) — resolve PDF URLs via Semantic Scholar, download with caching, Mistral OCR with image extraction, per-paper cache (text.md + images/)
  • CLI commandssearch, paper, citations, references, similar, read with JSON output, --pretty flag, and --include-images for the read command
  • MCP server (researchbot/mcp_server.py) — FastMCP server exposing 7 tools (search_papers, get_paper, get_citations, get_references, search_similar, read_paper, ocr_local_pdf) over stdio transport, auto-started by Claude Code via .mcp.json

Usage

CLI

# Search for papers
researchbot search "state space models" --limit 5 --pretty

# Get details for a specific paper
researchbot paper "ARXIV:1706.03762" --pretty

# Get papers that cite a paper
researchbot citations "ARXIV:1706.03762" --limit 5 --pretty

# Get papers referenced by a paper
researchbot references "ARXIV:1706.03762" --limit 5 --pretty

# Find similar papers
researchbot similar "ARXIV:1706.03762" --limit 5 --pretty

# Download + OCR a paper (markdown text output)
researchbot read "ARXIV:2106.15928"

# Download + OCR with image paths (JSON output)
researchbot read "ARXIV:2106.15928" --include-images --pretty

All commands output JSON by default. Add --pretty for indented output and --limit N to control result count.

Supported paper ID formats: ARXIV:xxx, DOI:xxx, CorpusId:xxx, Semantic Scholar hash, or URL:xxx.

The read command (and the MCP read_paper tool) also accepts paper names — they are resolved via Semantic Scholar search:

researchbot read "Attention is All You Need"

Project Structure

researchbot/
  __init__.py
  config.py       # API keys, OCR provider toggle, cache paths
  scholar.py      # Semantic Scholar API client
  pdf.py          # PDF download utilities
  ocr.py          # OCR pipeline (Mistral / Deepseek)
  cli.py          # CLI entry point (Click)
  mcp_server.py   # MCP server for Claude Code (FastMCP, stdio)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

researchbot-0.1.2.tar.gz (64.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

researchbot-0.1.2-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file researchbot-0.1.2.tar.gz.

File metadata

  • Download URL: researchbot-0.1.2.tar.gz
  • Upload date:
  • Size: 64.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for researchbot-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6eebb748f1191247e214a07057216c5b4becf7105b42eac17e99f5a4c8f30e5a
MD5 2cb19f7f9fb22d4792e7d477ffc473c7
BLAKE2b-256 50b06f23ce6db50f342a1b2f2f74a5e0c4ee5b5be00cdd4033c5f4b333375326

See more details on using hashes here.

File details

Details for the file researchbot-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: researchbot-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for researchbot-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ebe897a56d10fe682103614630bb98a87beb292bfdbfe41476923faabaec745e
MD5 2bb4627ece51bfebb1b56e67f048925e
BLAKE2b-256 ba06c2bf3337d72152764352270a5dcbb0c251cc4f5e7c35162f8ec4ce1fb621

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page