LSP-powered code intelligence with AI semantic search, exposed as an MCP server
Project description
Codebase Insights
An intelligent code analysis platform that combines Language Server Protocol (LSP) technology with AI-powered semantic search to provide comprehensive code intelligence. Exposes capabilities via an MCP (Model Context Protocol) server, making it usable by any MCP-compatible AI client such as Claude Desktop or GitHub Copilot.
Features
- Multi-language support — Python, JavaScript/TypeScript, C++, Rust, powered by standard LSP servers
- Symbol indexing — Full workspace scan with incremental re-indexing driven by filesystem watching
- Semantic search — AI-generated summaries + vector embeddings for natural-language code queries
- Hybrid ranking — Blends keyword matching with vector similarity, boosted by reference counts
- Flexible LLM backends — Ollama (local) or OpenAI-compatible APIs for both chat and embeddings
- MCP server — Exposes all capabilities over HTTP for use by any MCP client
Architecture
src/codebase_insights/
├── main.py CLI entry point & startup orchestration
├── language_analysis.py Detects languages; parses .gitignore
├── LSP.py Async LSP client (hover, definition, references, symbols, …)
├── workspace_indexer.py Indexes symbols into SQLite; watches for file changes
├── semantic_indexer.py LLM summarization + ChromaDB vector indexing & search
├── semantic_config.py TOML config loader with interactive first-time setup wizard
└── mcp_server.py MCP server exposing all tools over HTTP
Artifacts created at the project root (all added to .gitignore automatically):
| File/Directory | Purpose |
|---|---|
.codebase-index.db |
SQLite symbol database |
.codebase-semantic/ |
ChromaDB vector store |
.codebase-insights.toml |
Configuration file |
Prerequisites
- Python 3.11+
- At least one of the LSP servers below, matching the language(s) in the target codebase
- Ollama running locally or an OpenAI-compatible API key
LSP Servers
| Language | Server | Install |
|---|---|---|
| Python | pylsp |
pip install python-lsp-server |
| JavaScript / TypeScript | typescript-language-server |
npm install -g typescript-language-server |
| C++ | clangd |
clangd.llvm.org |
| Rust | rust-analyzer |
rustup component add rust-analyzer |
Optional Python LSP plugins: python-lsp-ruff, python-lsp-black, pylsp-mypy.
Installation
pip install codebase-insights
Or install from source for development:
git clone https://github.com/your-org/codebase-insights
cd codebase-insights
pip install -e .
Usage
codebase-insights <project_root> [options]
On first run an interactive wizard configures the LLM provider, embedding model, and indexing settings, saving the result to .codebase-insights.toml.
Options
| Flag | Description |
|---|---|
--new-config |
Re-run the setup wizard, overwriting the existing config |
--rebuild-index |
Drop and rebuild the SQLite symbol index from scratch |
--rebuild-semantic |
Drop all LLM summaries and ChromaDB vectors, regenerate everything |
--rebuild-summaries |
Regenerate only file/project summaries (keeps symbol summaries) |
--rebuild-vectors |
Re-embed existing summaries with the current embedding model (no LLM calls) |
Quick start with Ollama
# Terminal 1 – start Ollama
ollama serve
# Terminal 2 – index and serve
codebase-insights /path/to/your/project
Quick start with OpenAI
export OPENAI_API_KEY="sk-..."
codebase-insights /path/to/your/project --new-config
# choose "openai" when prompted for chat and embed providers
The MCP server starts on http://127.0.0.1:6789/mcp (streamable-HTTP transport).
MCP Tools
Once running, the following tools are available to any connected MCP client:
| Tool | Description |
|---|---|
languages_in_codebase() |
List detected languages in the project |
lsp_capabilities() |
Query active LSP server capabilities |
lsp_hover(file_uri, line, character) |
Type info and docs at a position |
lsp_definition(file_uri, line, character) |
Jump-to-definition |
lsp_declaration(file_uri, line, character) |
Find declarations |
lsp_implementation(file_uri, line, character) |
Find implementations |
lsp_references(file_uri, line, character) |
Find all references to a symbol |
lsp_document_symbols(file_uri) |
List all symbols in a file |
query_symbols(path, kinds, name_query, limit) |
Query the SQLite index by path, kind, or name |
semantic_search(query, limit, kinds) |
Natural-language semantic search |
Configuration
The config file .codebase-insights.toml is created interactively on first run. Key sections:
[chat]
provider = "ollama" # "ollama" | "openai"
[chat.ollama]
base_url = "http://localhost:11434"
model = "qwen2.5"
[embed]
provider = "ollama"
[embed.ollama]
model = "bge-m3"
[semantic]
index_kinds = ["Class", "Method", "Function", "Interface", "Enum", "Constructor"]
concurrency = 16 # parallel LLM requests (set to 1 for Ollama)
batch_size = 16
min_ref_count = 3 # only index symbols referenced at least N times
[ranking]
# noise penalties and re-ranking weights (see semantic_config.py for defaults)
Environment variables (OPENAI_API_KEY, OPENAI_BASE_URL) override the corresponding TOML values.
How It Works
- Startup — detects languages, verifies LSP servers are on
PATH, initialises LSP clients - Workspace indexing — scans all files via LSP
documentSymbol, stores symbols + references in SQLite; a watchdog observer re-indexes files as they change - Semantic indexing — for each qualifying symbol, extracts up to 50 lines of source context, calls the LLM for a 1–3 sentence summary, then embeds the summary in ChromaDB
- MCP server — clients call tools;
semantic_searchuses hybrid vector + keyword scoring with reference-count boosting and diversity decay;query_symbolsqueries SQLite directly - Incremental updates — SHA-256 file hashes and symbol-content hashes skip unchanged work; only new or modified symbols are re-summarised
Dependencies
| Package | Purpose |
|---|---|
mcp[cli] |
MCP server framework |
watchdog |
Filesystem monitoring |
langchain, langchain-ollama, langchain-openai |
LLM / embedding integration |
langchain-chroma, chromadb |
Vector store |
tqdm |
Progress bars |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codebase_insights-0.1.0.tar.gz.
File metadata
- Download URL: codebase_insights-0.1.0.tar.gz
- Upload date:
- Size: 222.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eab14eb4d43892458662b51ac95dc3852dee0e698ad6f8ee31d45772a1220322
|
|
| MD5 |
f58ef3dceb0516a6dabcf1bad59d25a5
|
|
| BLAKE2b-256 |
3d1277b64facaf6217a78a44f7854ab156715f78a0d4ffb46b4ce63ad6d062a5
|
Provenance
The following attestation bundles were made for codebase_insights-0.1.0.tar.gz:
Publisher:
publish.yml on JimmyfaQwQ/Codebase-Insights
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codebase_insights-0.1.0.tar.gz -
Subject digest:
eab14eb4d43892458662b51ac95dc3852dee0e698ad6f8ee31d45772a1220322 - Sigstore transparency entry: 1292897619
- Sigstore integration time:
-
Permalink:
JimmyfaQwQ/Codebase-Insights@5f8801d1005eeb488f2ed7d346399ad150443b85 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/JimmyfaQwQ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5f8801d1005eeb488f2ed7d346399ad150443b85 -
Trigger Event:
push
-
Statement type:
File details
Details for the file codebase_insights-0.1.0-py3-none-any.whl.
File metadata
- Download URL: codebase_insights-0.1.0-py3-none-any.whl
- Upload date:
- Size: 41.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
914ced224bb924f60a7091634a0ae8d7a87e6a2abe82dd116da15709c347825a
|
|
| MD5 |
c8c5379f5363ee479683a08fb8a18d28
|
|
| BLAKE2b-256 |
f5834f6e3cbdbe1136c6ace6de93ded3d0080eea7603a49dd2579d66200a6166
|
Provenance
The following attestation bundles were made for codebase_insights-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on JimmyfaQwQ/Codebase-Insights
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
codebase_insights-0.1.0-py3-none-any.whl -
Subject digest:
914ced224bb924f60a7091634a0ae8d7a87e6a2abe82dd116da15709c347825a - Sigstore transparency entry: 1292897625
- Sigstore integration time:
-
Permalink:
JimmyfaQwQ/Codebase-Insights@5f8801d1005eeb488f2ed7d346399ad150443b85 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/JimmyfaQwQ
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5f8801d1005eeb488f2ed7d346399ad150443b85 -
Trigger Event:
push
-
Statement type: