Skip to main content

Scan any codebase and generate onboarding docs automatically

Project description

RepoGuide

A CLI tool that scans any local codebase and generates a detailed onboarding document — the kind of walkthrough a senior dev would write for a new team member, produced automatically.

Works with Anthropic (Claude), OpenAI (GPT), or Ollama (local models, no API key needed).


What This Demonstrates

Business problem solved: Onboarding to a new codebase is slow. Architecture lives in someone's head, entry points aren't obvious, and setup instructions are outdated or missing. RepoGuide reads the repo and produces a structured markdown guide covering architecture, tech stack, entry points, key patterns, setup instructions, and areas of complexity.

Why this matters for AI engineering: This is AI applied to a real developer workflow — not a chatbot demo. It demonstrates MCP tool orchestration, structured LLM output generation, a provider abstraction that makes the LLM swappable, and a two-pass analysis strategy that keeps token usage efficient by scanning first and reading selectively.


How It Works

RepoGuide uses a two-pass architecture:

Pass 1 — Structural scan (no LLM, no tokens spent): An MCP server scans the directory tree, detects the tech stack from config files (package.json, requirements.txt, Cargo.toml, etc.), counts file types, and identifies likely entry points based on framework-specific knowledge.

Pass 2 — Selective deep read (targeted, token-efficient): Based on what Pass 1 found, the agent reads only the files that matter — entry points, config files, READMEs. The LLM receives the full tree structure plus contents of key files, not the entire codebase.

Generation: Everything gathered is sent to the configured LLM provider with a structured system prompt. The output is a comprehensive ONBOARDING.md written in a direct, specific tone with actual file names, function references, and code patterns.


Quick Start

1. Install

From PyPI:

pip install repoguide-ai

From source:

git clone https://github.com/M33p5t3r/codescout-py.git
cd codescout-py
pip install .

2. Run RepoGuide

With Anthropic (default):

set ANTHROPIC_API_KEY=sk-ant-...          # Windows
export ANTHROPIC_API_KEY=sk-ant-...       # Mac/Linux
repoguide

With OpenAI:

set OPENAI_API_KEY=sk-...                 # Windows
export OPENAI_API_KEY=sk-...              # Mac/Linux
repoguide --provider openai

With Ollama (local, no API key):

ollama pull llama3.1                      # Download a model first
repoguide --provider ollama
repoguide --provider ollama --model mistral   # Use a different model

3. Follow the Prompts

You'll be asked for:

  • Repository path — absolute path to any local repo
  • Ignore patterns — optional comma-separated folder names to skip

RepoGuide scans the repo, shows what it detected, asks you to confirm, then generates and saves ONBOARDING.md in the target repo's root.


Architecture

┌─────────────────────────────────────────────────────┐
│                  CLI (cli.py)                        │
│                                                     │
│  1. Get repo path from user                         │
│  2. Connect to MCP server                           │
│  3. Call scan_repo → display detected stack          │
│  4. Confirm with user                               │
│  5. Call detect_entry_points → get reading list      │
│  6. Call read_file on each entry point               │
│  7. Send everything to LLM → generate markdown       │
│  8. Save ONBOARDING.md to the target repo            │
└────────────┬──────────────────────┬─────────────────┘
             │ MCP Protocol         │ LLM Call
             ▼                      ▼
┌────────────────────────┐  ┌──────────────────────────┐
│  MCP Server            │  │  Provider (providers.py)  │
│  (repo_server.py)      │  │                          │
│                        │  │  AnthropicProvider       │
│  scan_repo             │  │  OpenAIProvider          │
│  read_file             │  │  OllamaProvider          │
│  detect_entry_points   │  │                          │
└────────────────────────┘  └──────────────────────────┘

MCP Server Tools

Tool Purpose Token Cost
scan_repo Walk directory tree, detect stack from config files, count file types Zero (pure Python)
read_file Read a specific file with max line guard Proportional to file size
detect_entry_points Suggest key files based on detected frameworks Zero (pure Python)

LLM Providers

Provider Command API Key Required Best For
Anthropic --provider anthropic Yes (ANTHROPIC_API_KEY) Best output quality (default)
OpenAI --provider openai Yes (OPENAI_API_KEY) Alternative cloud provider
Ollama --provider ollama No Offline use, privacy, free

Override the default model with --model:

repoguide --provider anthropic --model claude-opus-4-6
repoguide --provider openai --model gpt-4o-mini
repoguide --provider ollama --model codellama

Supported Frameworks

The stack detection recognizes: Next.js, React, Vue, Nuxt, SvelteKit, Express, NestJS, FastAPI, Flask, Django, Streamlit, Astro, Remix, Gatsby, and generic Node/Python projects. Adding a new framework means adding entries to the detection maps — no logic changes needed.


Example Output

Run RepoGuide against any local repo to generate a full ONBOARDING.md. The output covers architecture, tech stack, entry points, key patterns, setup instructions, and areas of complexity — written in a direct, specific tone with actual file names and function references.


Adding a New Provider

  1. Create a class in providers.py that inherits from LLMProvider
  2. Implement generate(system_prompt, user_message, max_tokens) -> str
  3. Implement validate_config() -> str | None
  4. Add it to the PROVIDERS dict

The MCP server, scanning logic, and output format are all provider-agnostic.


Design Principles

RepoGuide uses the MCP client/server pattern (FastMCP framework) applied to a developer tooling problem. The provider abstraction demonstrates clean separation between orchestration logic and model calls — the LLM is a swappable component, not the product.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repoguide_ai-0.1.2.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repoguide_ai-0.1.2-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file repoguide_ai-0.1.2.tar.gz.

File metadata

  • Download URL: repoguide_ai-0.1.2.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for repoguide_ai-0.1.2.tar.gz
Algorithm Hash digest
SHA256 73ae3a8560cc07a4d60c2c8137ad14a99bcca9aaaa101aa7dd2bf75c2d5631a5
MD5 0d6d3c7e47495dab1da7aac48f4c228f
BLAKE2b-256 5fdb529b8d5c1d543f0ce08fe9f1661acff12498051711cdea2fcc2a1fa142e1

See more details on using hashes here.

File details

Details for the file repoguide_ai-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: repoguide_ai-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.14

File hashes

Hashes for repoguide_ai-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dbdaeb861f5691b94c362cf22832bdd9d6a2649b0d217b3e64d06c80ce6ec038
MD5 fd78e0d1490c68f16b5b319ec90228ae
BLAKE2b-256 8ec3cea311a3484df64703328bafeba226d58c6b666964c824a5e2396f3d28ba

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page