MCP server for persistent codebase memory with semantic search
Project description
Thoth
MCP server providing persistent codebase memory with semantic search for AI assistants.
Overview
Thoth indexes code repositories using AST parsing and provides tools for symbol lookup, cross-repository navigation, and architecture visualization. With v0.2.0, Thoth now includes semantic search powered by local embeddings, allowing natural language queries to find relevant code without exact keyword matches.
The index persists in ~/.thoth/, giving Claude and other MCP-compatible assistants memory across conversations.
Features
- 🔍 Semantic Search: Find code using natural language queries with vLLM and Qwen3 embeddings
- 🧠 Persistent Memory: Code understanding persists between conversations
- 🔗 Cross-Repository: Navigate dependencies across multiple related repositories
- 📊 Visualizations: Generate architecture diagrams and dependency graphs
- ⚡ Fast Indexing: AST-based parsing with incremental updates
- 🎯 Precise Navigation: Jump to exact definitions, find all callers
- 🔧 Local-First: All processing happens locally, no cloud dependencies
Installation
Requirements
- Python 3.10-3.12 (Python 3.13 not yet supported due to vLLM dependencies)
- For semantic search: ~2GB disk space for embedding model
Claude Desktop
Add to your configuration file:
- macOS:
~/Library/Application Support/Claude/claude_desktop_config.json - Windows:
%APPDATA%\Claude\claude_desktop_config.json - Linux:
~/.config/claude/claude_desktop_config.json
Configuration:
{
"mcpServers": {
"thoth": {
"command": "uvx",
"args": ["--python", "3.12", "mcp-server-thoth"]
}
}
}
To index repositories, either:
- Use the CLI:
thoth-cli index myrepo /path/to/repo - Use the
index_repositorytool from within Claude
Command Line
# Install globally
uv tool install --python 3.12 mcp-server-thoth
# Index a repository
thoth-cli index myproject /path/to/repo
# Search symbols
thoth-cli search "database connection"
# Start MCP server
mcp-server-thoth
Tools
Core Tools
find_definition- Locate symbol definitionsget_file_structure- Extract functions, classes, imports from a filesearch_symbols- Search symbols by name patternget_callers- Find callers of a functionlist_repositories- List indexed repositoriesindex_repository- Index a new repository
Semantic Search (v0.2.0+)
semantic_search- Natural language code search using embeddings- Example: "function that handles user authentication"
- Returns relevant symbols ranked by semantic similarity
Visualization Tools
generate_module_diagram- Generate Mermaid dependency diagramsgenerate_system_architecture- Visualize cross-repository relationshipstrace_api_flow- Trace client-server communication paths
Architecture
Storage Backend
Thoth uses a hybrid storage approach:
-
SQLite (
~/.thoth/index.db): Source of truth for structured datasymbols- Functions, classes, methods with location and parent relationshipsimports- Import statements with cross-repository resolutioncalls- Function call graph (caller → callee mapping)files- File metadata and content hashes for incremental updates
-
ChromaDB (
~/.thoth/chroma/): Vector storage for semantic search- Stores embeddings for all indexed symbols
- Enables natural language queries
-
NetworkX: In-memory graph for fast relationship traversal
Embedding Model
Semantic search uses Qwen3-Embedding-0.6B via vLLM:
- Lightweight (600M parameters, ~1.2GB on disk)
- Code-aware embeddings with instruction support
- Fast inference with GPU acceleration (optional)
- Falls back to TF-IDF when vLLM is unavailable
Performance
- Indexing: ~10K symbols/minute
- Semantic Search: <100ms for typical queries
- Memory: ~2GB for model + ~100MB per 100K symbols
- Accuracy: 0.7-0.9 relevance scores for code search
Advanced Usage
Pre-indexing Large Repositories
For large monorepos, pre-index before adding to Claude:
thoth-cli index myrepo /path/to/large-repo
Using Redis Cache (Optional)
For improved performance with multiple users:
# Install with Redis support
uv tool install "mcp-server-thoth[cache]"
# Requires Redis server running locally
Dashboard (Coming Soon)
A separate thoth-dashboard package will provide:
- Web UI for exploring indexed code
- Interactive dependency graphs
- Real-time search interface
Development
git clone https://github.com/braininahat/thoth
cd thoth
uv pip install -e ".[dev]"
# Run tests
pytest
# Type checking
mypy thoth
Token Efficiency
Thoth dramatically reduces the tokens needed for code navigation:
Without Thoth: Multiple searches + reading entire files = ~50K tokens With Thoth: Semantic search + precise results = ~2K tokens
Example:
User: "How does the dashboard update in real-time?"
Without Thoth:
- grep "dashboard" → 50 results
- grep "update" → 200 results
- Read 10+ files to understand
With Thoth semantic search:
- Returns: WebSocketHandler.send_update(), Dashboard.subscribe_to_changes(), etc.
- Ranked by relevance
Troubleshooting
Python Version Issues
If you see errors about xformers or build failures:
# Ensure Python 3.12 is used
uvx --python 3.12 mcp-server-thoth
GPU Memory
For systems with limited GPU memory:
- Embeddings are automatically moved to CPU after computation
- Set
CUDA_VISIBLE_DEVICES=-1to force CPU-only mode
Model Download
First run downloads the embedding model (~1.2GB). Subsequent runs use the cached model.
License
MIT
Contributing
Contributions welcome! Please check the issues page.
Acknowledgments
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_server_thoth-0.2.2.tar.gz.
File metadata
- Download URL: mcp_server_thoth-0.2.2.tar.gz
- Upload date:
- Size: 322.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee77cc56baea4ced27b1c2e2e530869e3a8ff1b60345df61db8e5576db9cfb0c
|
|
| MD5 |
920ca9d4d1d470d97b3fe9ab9d3d97ae
|
|
| BLAKE2b-256 |
3c63f43333f1b89124d7795993e76ce5bfbb90e270bdfc28821e24a00fbdcaee
|
File details
Details for the file mcp_server_thoth-0.2.2-py3-none-any.whl.
File metadata
- Download URL: mcp_server_thoth-0.2.2-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cfe3f05c92c1979da04150bcee9023cf66468dd5aebb802677e640450d82de0
|
|
| MD5 |
8391f20368d2f74ad4adebbf546e8d60
|
|
| BLAKE2b-256 |
8b3666290310ad2da9f67984b851c38c252d2ae5f5be1806332d392257e57ca9
|