Scan any codebase and generate onboarding docs automatically
Project description
RepoGuide
A CLI tool that scans any local codebase and generates a detailed onboarding document — the kind of walkthrough a senior dev would write for a new team member, produced automatically.
Works with Anthropic (Claude), OpenAI (GPT), or Ollama (local models, no API key needed).
What This Demonstrates
Business problem solved: Onboarding to a new codebase is slow. Architecture lives in someone's head, entry points aren't obvious, and setup instructions are outdated or missing. RepoGuide reads the repo and produces a structured markdown guide covering architecture, tech stack, entry points, key patterns, setup instructions, and areas of complexity.
Why this matters for AI engineering: This is AI applied to a real developer workflow — not a chatbot demo. It demonstrates MCP tool orchestration, structured LLM output generation, a provider abstraction that makes the LLM swappable, and a two-pass analysis strategy that keeps token usage efficient by scanning first and reading selectively.
How It Works
RepoGuide uses a two-pass architecture:
Pass 1 — Structural scan (no LLM, no tokens spent): An MCP server scans the directory tree, detects the tech stack from config files (package.json, requirements.txt, Cargo.toml, etc.), counts file types, and identifies likely entry points based on framework-specific knowledge.
Pass 2 — Selective deep read (targeted, token-efficient): Based on what Pass 1 found, the agent reads only the files that matter — entry points, config files, READMEs. The LLM receives the full tree structure plus contents of key files, not the entire codebase.
Generation: Everything gathered is sent to the configured LLM provider with a structured system prompt. The output is a comprehensive ONBOARDING.md written in a direct, specific tone with actual file names, function references, and code patterns.
Quick Start
1. Install
From PyPI:
pip install repoguide-ai
From source:
git clone https://github.com/M33p5t3r/codescout-py.git
cd codescout-py
pip install .
2. Run RepoGuide
With Anthropic (default):
set ANTHROPIC_API_KEY=sk-ant-... # Windows
export ANTHROPIC_API_KEY=sk-ant-... # Mac/Linux
repoguide
With OpenAI:
set OPENAI_API_KEY=sk-... # Windows
export OPENAI_API_KEY=sk-... # Mac/Linux
repoguide --provider openai
With Ollama (local, no API key):
ollama pull llama3.1 # Download a model first
repoguide --provider ollama
repoguide --provider ollama --model mistral # Use a different model
3. Follow the Prompts
You'll be asked for:
- Repository path — absolute path to any local repo
- Ignore patterns — optional comma-separated folder names to skip
RepoGuide scans the repo, shows what it detected, asks you to confirm, then generates and saves ONBOARDING.md in the target repo's root.
Architecture
┌─────────────────────────────────────────────────────┐
│ CLI (cli.py) │
│ │
│ 1. Get repo path from user │
│ 2. Connect to MCP server │
│ 3. Call scan_repo → display detected stack │
│ 4. Confirm with user │
│ 5. Call detect_entry_points → get reading list │
│ 6. Call read_file on each entry point │
│ 7. Send everything to LLM → generate markdown │
│ 8. Save ONBOARDING.md to the target repo │
└────────────┬──────────────────────┬─────────────────┘
│ MCP Protocol │ LLM Call
▼ ▼
┌────────────────────────┐ ┌──────────────────────────┐
│ MCP Server │ │ Provider (providers.py) │
│ (repo_server.py) │ │ │
│ │ │ AnthropicProvider │
│ scan_repo │ │ OpenAIProvider │
│ read_file │ │ OllamaProvider │
│ detect_entry_points │ │ │
└────────────────────────┘ └──────────────────────────┘
MCP Server Tools
| Tool | Purpose | Token Cost |
|---|---|---|
scan_repo |
Walk directory tree, detect stack from config files, count file types | Zero (pure Python) |
read_file |
Read a specific file with max line guard | Proportional to file size |
detect_entry_points |
Suggest key files based on detected frameworks | Zero (pure Python) |
LLM Providers
| Provider | Command | API Key Required | Best For |
|---|---|---|---|
| Anthropic | --provider anthropic |
Yes (ANTHROPIC_API_KEY) |
Best output quality (default) |
| OpenAI | --provider openai |
Yes (OPENAI_API_KEY) |
Alternative cloud provider |
| Ollama | --provider ollama |
No | Offline use, privacy, free |
Override the default model with --model:
repoguide --provider anthropic --model claude-opus-4-6
repoguide --provider openai --model gpt-4o-mini
repoguide --provider ollama --model codellama
Supported Frameworks
The stack detection recognizes: Next.js, React, Vue, Nuxt, SvelteKit, Express, NestJS, FastAPI, Flask, Django, Streamlit, Astro, Remix, Gatsby, and generic Node/Python projects. Adding a new framework means adding entries to the detection maps — no logic changes needed.
Example Output
Run RepoGuide against any local repo to generate a full ONBOARDING.md. The output covers architecture, tech stack, entry points, key patterns, setup instructions, and areas of complexity — written in a direct, specific tone with actual file names and function references.
Adding a New Provider
- Create a class in
providers.pythat inherits fromLLMProvider - Implement
generate(system_prompt, user_message, max_tokens) -> str - Implement
validate_config() -> str | None - Add it to the
PROVIDERSdict
The MCP server, scanning logic, and output format are all provider-agnostic.
Design Principles
RepoGuide uses the MCP client/server pattern (FastMCP framework) applied to a developer tooling problem. The provider abstraction demonstrates clean separation between orchestration logic and model calls — the LLM is a swappable component, not the product.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file repoguide_ai-0.1.2.tar.gz.
File metadata
- Download URL: repoguide_ai-0.1.2.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73ae3a8560cc07a4d60c2c8137ad14a99bcca9aaaa101aa7dd2bf75c2d5631a5
|
|
| MD5 |
0d6d3c7e47495dab1da7aac48f4c228f
|
|
| BLAKE2b-256 |
5fdb529b8d5c1d543f0ce08fe9f1661acff12498051711cdea2fcc2a1fa142e1
|
File details
Details for the file repoguide_ai-0.1.2-py3-none-any.whl.
File metadata
- Download URL: repoguide_ai-0.1.2-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbdaeb861f5691b94c362cf22832bdd9d6a2649b0d217b3e64d06c80ce6ec038
|
|
| MD5 |
fd78e0d1490c68f16b5b319ec90228ae
|
|
| BLAKE2b-256 |
8ec3cea311a3484df64703328bafeba226d58c6b666964c824a5e2396f3d28ba
|