Semantic code search + LLM answers for your TypeScript/React codebase — runs entirely on your machine.

These details have not been verified by PyPI

Project links

Project description

codescout

Semantic code search + LLM answers for your TypeScript/React codebase — runs entirely on your machine.

Quickstart • MCP Server • CLI • Embedding models • How it works • Configuration

CodeScout indexes TypeScript and React codebases with tree-sitter AST parsing, generates local embeddings via sentence-transformers, and answers natural-language questions with an LLM — all with zero external infrastructure. No Docker, no database server, no GPU, no API key for embeddings.

Run it as an MCP server and any agent (GitHub Copilot, Claude Code, Cursor) gets instant, cited answers from your codebase before writing a single line of code.

Quickstart

pip install codescout
export OPENROUTER_API_KEY=sk-or-...   # free tier at openrouter.ai — only needed for ask
codescout ask "how does authentication work?"

On first run, codescout automatically initializes .codescout/ in your project, downloads the embedding model (~90 MB, cached), indexes every file, then answers your question. No other setup needed.

Main Features

AST-aware chunking — parses TypeScript and TSX with tree-sitter; every function, component, hook, type, and interface becomes its own chunk with the right semantic label
Enriched embeddings — each chunk is embedded with its natural-language description + imports + source, so queries match on meaning, not just keywords
Incremental indexing — files are SHA256-hashed; only changed files are re-processed on subsequent runs
Fully local — embeddings are generated by sentence-transformers on CPU, stored in FAISS + SQLite inside .codescout/; nothing leaves your machine
MCP server — expose search_codebase and index_status as MCP tools; Copilot, Claude Code, and Cursor call them automatically
Direct LLM answers — codescout ask retrieves relevant chunks and returns a cited, plain-English answer via OpenRouter (not just a list of snippets)

MCP Server

Run codescout as an MCP server so your agent searches your codebase directly — before writing or modifying code.

Setup

pip install 'codescout[mcp]'
codescout index          # index first
codescout mcp-init       # generates config + agent instructions (run once per project)

mcp-init creates:

File	Purpose
`.vscode/mcp.json`	Tells VS Code how to launch the MCP server
`.github/copilot-instructions.md`	Instructs GitHub Copilot to call `search_codebase` before tasks
`CLAUDE.md`	Same instructions for Claude Code

Reload VS Code after running mcp-init. The agent will then call search_codebase automatically on every task.

Manual config

{
  "servers": {
    "codescout": {
      "type": "stdio",
      "command": "codescout",
      "args": ["mcp-serve"]
    }
  }
}

Available tools

Tool	Description
`search_codebase(query, top_k?)`	Return the most relevant code chunks for a natural-language query
`index_status()`	Report how many files and chunks are indexed and when the last run was

Commands

codescout init                          # Create .codescout/ config in current project
codescout index                         # Scan and embed (incremental by default)
codescout index --full                  # Force complete re-index from scratch
codescout index --verbose               # Show per-file chunk counts while indexing
codescout ask "your question"           # Semantic search + LLM answer
codescout ask "..." --show-context      # Also print retrieved code chunks
codescout ask "..." --top-k 10          # Retrieve more chunks (default: 5)
codescout ask "..." --model openai/gpt-4o   # Override LLM model
codescout status                        # Show index stats (files, chunks, size)
codescout config                        # View all config values
codescout config top_k 10              # Set a config value
codescout mcp-init                      # Generate MCP config + agent instruction files
codescout mcp-serve                     # Start MCP server (stdio)

Embedding Models

Embeddings are generated entirely locally with sentence-transformers — no API key, no internet after the first download. Models are cached at ~/.cache/huggingface/hub/.

Model	Config value	Dims	Size	Code quality
all-MiniLM-L6-v2 (default)	`all-MiniLM-L6-v2`	384	~90 MB	⭐⭐⭐ Good starting point
nomic CodeRankEmbed (recommended)	`nomic-ai/CodeRankEmbed`	768	~520 MB	⭐⭐⭐⭐⭐ Best open-source for code
Jina v2 Code	`jinaai/jina-embeddings-v2-base-code`	768	~610 MB	⭐⭐⭐⭐ Strong, 8k context
Salesforce CodeT5+	`Salesforce/codet5p-110m-embedding`	256	~420 MB	⭐⭐⭐⭐ Compact index size

All models run on CPU. No GPU required.

Switching models

// .codescout/config.json
{
  "model": "nomic-ai/CodeRankEmbed",
  "embedding_dim": 768
}

codescout index --full   # always re-index after changing models

embedding_dim must match the model's output dimension. Mixing embeddings from two different models in the same index produces wrong results.

How It Works

Scan — walks the project tree, respects .gitignore, filters by configured extensions
Parse — tree-sitter extracts top-level declarations (functions, components, hooks, types, interfaces, classes)
Chunk — each declaration becomes one chunk; a natural-language description is generated and prepended before embedding
Embed — all chunks are encoded in a single batched call (memory-safe mini-batches of 32)
Store — vectors go into FAISS (index.faiss), metadata + source into SQLite (metadata.db), both inside .codescout/
Query — question is embedded, FAISS finds nearest vectors, source is fetched from SQLite, sent to LLM

Chunk classification

Type	Detection
`component`	JSX in body + uppercase first letter
`hook`	name starts with `use` + uppercase
`function`	any other named function or arrow
`type` / `interface`	TypeScript type alias or interface
`class`	class or abstract class declaration
`constant`	exported non-function declaration

Why descriptions improve recall

A chunk for useAuthToken() gets the description "hook useAuthToken — Uses: token, setToken, userId". A query for "authentication token handling" matches this description even if the words "authentication" or "token" never appear in the function body. Interfaces get their field names extracted; hooks and components get their destructured state variables listed. Tools that embed raw source alone miss this signal.

Configuration

.codescout/config.json (created by codescout init):

{
  "model": "all-MiniLM-L6-v2",
  "embedding_dim": 384,
  "top_k": 5,
  "extensions": [".ts", ".tsx", ".js", ".jsx"],
  "exclude": ["node_modules", "dist", ".next", "build", "*.test.ts", "*.spec.ts"],
  "llm_model": "anthropic/claude-sonnet-4",
  "max_chunk_lines": 80,
  "min_chunk_lines": 3,
  "max_context_chars": 12000
}

Read or write any value:

codescout config                   # show all
codescout config top_k             # read one
codescout config top_k 10          # write one

Storage

Everything lives in .codescout/ inside the project root:

.codescout/
├── config.json       # project config
├── index.faiss       # FAISS vector index (~1.5 MB per 1,000 chunks at 384 dims)
├── metadata.db       # SQLite: chunk source, file hashes, line numbers
└── .gitignore        # auto-generated; prevents committing the index to git

Inspect directly:

sqlite3 .codescout/metadata.db \
  "SELECT name, chunk_type, file_path, start_line FROM chunks LIMIT 20;"

Installation

pip install codescout            # core
pip install 'codescout[mcp]'     # + MCP server

Requires Python 3.10+. FAISS, sentence-transformers, and tree-sitter are bundled as dependencies.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Jun 4, 2026

This version

0.1.0

Jun 4, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emb_codescout-0.1.0.tar.gz (32.5 kB view details)

Uploaded Jun 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

emb_codescout-0.1.0-py3-none-any.whl (6.6 kB view details)

Uploaded Jun 4, 2026 Python 3

File details

Details for the file emb_codescout-0.1.0.tar.gz.

File metadata

Download URL: emb_codescout-0.1.0.tar.gz
Upload date: Jun 4, 2026
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for emb_codescout-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2796237688ce3dedc150d65e4591cff2a1f37be132830ac3ff4980ba1c8a28b3`
MD5	`0004dcc3fb38786fa84600f077ee803b`
BLAKE2b-256	`4f71de905f2a525336c87a796acd1c59bc611f14d0ee256ee7b7fc2185bc6746`

See more details on using hashes here.

File details

Details for the file emb_codescout-0.1.0-py3-none-any.whl.

File metadata

Download URL: emb_codescout-0.1.0-py3-none-any.whl
Upload date: Jun 4, 2026
Size: 6.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for emb_codescout-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`afa5bceef0e0b8acb8af1795549d6f05c16d3f73d6a11190b4a560022cbfc089`
MD5	`6e63a0120c8f4cf0d4182273c1d52a16`
BLAKE2b-256	`da6e11fa9110f11cc6f48808775f9bf82a700d71e093e495eed5f1fa283f62a1`

See more details on using hashes here.

emb-codescout 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

codescout

Quickstart

Main Features

MCP Server

Setup

Manual config

Available tools

Commands

Embedding Models

Switching models

How It Works

Chunk classification

Why descriptions improve recall

Configuration

Storage

Installation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes