Skip to main content

MCP server for persistent codebase memory with semantic search

Project description

Thoth

MCP server providing persistent codebase memory with semantic search for AI assistants.

PyPI License Python Versions

Overview

Thoth indexes code repositories using AST parsing and provides tools for symbol lookup, cross-repository navigation, and architecture visualization. With v0.2.0, Thoth now includes semantic search powered by local embeddings, allowing natural language queries to find relevant code without exact keyword matches.

The index persists in ~/.thoth/, giving Claude and other MCP-compatible assistants memory across conversations.

Features

  • 🔍 Semantic Search: Find code using natural language queries with vLLM and Qwen3 embeddings
  • 🧠 Persistent Memory: Code understanding persists between conversations
  • 🔗 Cross-Repository: Navigate dependencies across multiple related repositories
  • 📊 Visualizations: Generate architecture diagrams and dependency graphs
  • Fast Indexing: AST-based parsing with incremental updates
  • 🎯 Precise Navigation: Jump to exact definitions, find all callers
  • 🔧 Local-First: All processing happens locally, no cloud dependencies

Installation

Requirements

  • Python 3.10-3.12 (Python 3.13 not yet supported due to vLLM dependencies)
  • For semantic search: ~2GB disk space for embedding model

Claude Desktop

Add to your configuration file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/claude/claude_desktop_config.json

Configuration:

{
  "mcpServers": {
    "thoth": {
      "command": "uvx",
      "args": ["--python", "3.12", "mcp-server-thoth"]
    }
  }
}

To index repositories, either:

  1. Use the CLI: thoth-cli index myrepo /path/to/repo
  2. Use the index_repository tool from within Claude

Command Line

# Install globally
uv tool install --python 3.12 mcp-server-thoth

# Index a repository
thoth-cli index myproject /path/to/repo

# Search symbols
thoth-cli search "database connection"

# Start MCP server
mcp-server-thoth

Tools

Core Tools

  • find_definition - Locate symbol definitions
  • get_file_structure - Extract functions, classes, imports from a file
  • search_symbols - Search symbols by name pattern
  • get_callers - Find callers of a function
  • list_repositories - List indexed repositories
  • index_repository - Index a new repository

Semantic Search (v0.2.0+)

  • semantic_search - Natural language code search using embeddings
    • Example: "function that handles user authentication"
    • Returns relevant symbols ranked by semantic similarity

Visualization Tools

  • generate_module_diagram - Generate Mermaid dependency diagrams
  • generate_system_architecture - Visualize cross-repository relationships
  • trace_api_flow - Trace client-server communication paths

Architecture

Storage Backend

Thoth uses a hybrid storage approach:

  • SQLite (~/.thoth/index.db): Source of truth for structured data

    • symbols - Functions, classes, methods with location and parent relationships
    • imports - Import statements with cross-repository resolution
    • calls - Function call graph (caller → callee mapping)
    • files - File metadata and content hashes for incremental updates
  • ChromaDB (~/.thoth/chroma/): Vector storage for semantic search

    • Stores embeddings for all indexed symbols
    • Enables natural language queries
  • NetworkX: In-memory graph for fast relationship traversal

Embedding Model

Semantic search uses Qwen3-Embedding-0.6B via vLLM:

  • Lightweight (600M parameters, ~1.2GB on disk)
  • Code-aware embeddings with instruction support
  • Fast inference with GPU acceleration (optional)
  • Falls back to TF-IDF when vLLM is unavailable

Performance

  • Indexing: ~10K symbols/minute
  • Semantic Search: <100ms for typical queries
  • Memory: ~2GB for model + ~100MB per 100K symbols
  • Accuracy: 0.7-0.9 relevance scores for code search

Advanced Usage

Pre-indexing Large Repositories

For large monorepos, pre-index before adding to Claude:

thoth-cli index myrepo /path/to/large-repo

Using Redis Cache (Optional)

For improved performance with multiple users:

# Install with Redis support
uv tool install "mcp-server-thoth[cache]"

# Requires Redis server running locally

Dashboard (Coming Soon)

A separate thoth-dashboard package will provide:

  • Web UI for exploring indexed code
  • Interactive dependency graphs
  • Real-time search interface

Development

git clone https://github.com/braininahat/thoth
cd thoth
uv pip install -e ".[dev]"

# Run tests
pytest

# Type checking
mypy thoth

Token Efficiency

Thoth dramatically reduces the tokens needed for code navigation:

Without Thoth: Multiple searches + reading entire files = ~50K tokens With Thoth: Semantic search + precise results = ~2K tokens

Example:

User: "How does the dashboard update in real-time?"

Without Thoth:
- grep "dashboard" → 50 results
- grep "update" → 200 results  
- Read 10+ files to understand

With Thoth semantic search:
- Returns: WebSocketHandler.send_update(), Dashboard.subscribe_to_changes(), etc.
- Ranked by relevance

Troubleshooting

Python Version Issues

If you see errors about xformers or build failures:

# Ensure Python 3.12 is used
uvx --python 3.12 mcp-server-thoth

GPU Memory

For systems with limited GPU memory:

  • Embeddings are automatically moved to CPU after computation
  • Set CUDA_VISIBLE_DEVICES=-1 to force CPU-only mode

Model Download

First run downloads the embedding model (~1.2GB). Subsequent runs use the cached model.

License

MIT

Contributing

Contributions welcome! Please check the issues page.

Acknowledgments

  • MCP by Anthropic
  • vLLM for fast inference
  • Qwen for lightweight embeddings

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_server_thoth-0.2.2.tar.gz (322.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_server_thoth-0.2.2-py3-none-any.whl (36.9 kB view details)

Uploaded Python 3

File details

Details for the file mcp_server_thoth-0.2.2.tar.gz.

File metadata

  • Download URL: mcp_server_thoth-0.2.2.tar.gz
  • Upload date:
  • Size: 322.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.9

File hashes

Hashes for mcp_server_thoth-0.2.2.tar.gz
Algorithm Hash digest
SHA256 ee77cc56baea4ced27b1c2e2e530869e3a8ff1b60345df61db8e5576db9cfb0c
MD5 920ca9d4d1d470d97b3fe9ab9d3d97ae
BLAKE2b-256 3c63f43333f1b89124d7795993e76ce5bfbb90e270bdfc28821e24a00fbdcaee

See more details on using hashes here.

File details

Details for the file mcp_server_thoth-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_server_thoth-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3cfe3f05c92c1979da04150bcee9023cf66468dd5aebb802677e640450d82de0
MD5 8391f20368d2f74ad4adebbf546e8d60
BLAKE2b-256 8b3666290310ad2da9f67984b851c38c252d2ae5f5be1806332d392257e57ca9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page