Skip to main content

MCP server for persistent codebase memory with semantic search and development tracking

Project description

Thoth

MCP server providing persistent codebase memory with semantic search for AI assistants.

PyPI License Python Versions

Overview

Thoth indexes code repositories using AST parsing and provides tools for symbol lookup, cross-repository navigation, and architecture visualization. With v0.2.0, semantic search was added using local embeddings. Now v0.3.0 introduces development memory to track and learn from all coding attempts, including failures.

The index persists in ~/.thoth/, giving Claude and other MCP-compatible assistants memory across conversations.

Features

  • 🔍 Semantic Search: Find code using natural language queries with vLLM and Qwen3 embeddings
  • 🧠 Persistent Memory: Code understanding persists between conversations
  • 📝 Development Memory: Track all coding attempts and learn from failures (v0.3.0)
  • 🔗 Cross-Repository: Navigate dependencies across multiple related repositories
  • 📊 Visualizations: Generate architecture diagrams and dependency graphs
  • Fast Indexing: AST-based parsing with incremental updates
  • 🎯 Precise Navigation: Jump to exact definitions, find all callers
  • 🔧 Local-First: All processing happens locally, no cloud dependencies

Installation

Requirements

  • Python 3.10-3.12 (Python 3.13 not yet supported due to vLLM dependencies)
  • For semantic search: ~2GB disk space for embedding model

Quick Start

# Install Thoth
uv tool install --python 3.12 mcp-server-thoth

# Initialize (downloads model, sets up database)
thoth-cli init

# Index your first repository
thoth-cli index myproject /path/to/repo

# Add to Claude
claude mcp add thoth -s user -- uvx --python 3.12 mcp-server-thoth

First-Time Setup

Before using Thoth with Claude, run the initialization:

thoth-cli init

This will:

  • ✅ Set up the database
  • ✅ Create necessary directories
  • ✅ Download the embedding model (~460MB, one-time)
  • ✅ Verify the installation

Options:

  • --skip-model - Skip model download (disables semantic search)
  • --model-cache-dir PATH - Custom directory for model weights

Claude Desktop

Add to your configuration file:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json
  • Linux: ~/.config/claude/claude_desktop_config.json

Configuration:

{
  "mcpServers": {
    "thoth": {
      "command": "uvx",
      "args": ["--python", "3.12", "mcp-server-thoth"]
    }
  }
}

To index repositories, either:

  1. Use the CLI: thoth-cli index myrepo /path/to/repo
  2. Use the index_repository tool from within Claude

Command Line

# Install globally
uv tool install --python 3.12 mcp-server-thoth

# Initialize Thoth (first time only)
thoth-cli init

# Index a repository
thoth-cli index myproject /path/to/repo

# Search symbols
thoth-cli search "database connection"

# List indexed repositories
thoth-cli list

# Start MCP server
mcp-server-thoth

Tools

Core Tools

  • find_definition - Locate symbol definitions
  • get_file_structure - Extract functions, classes, imports from a file
  • search_symbols - Search symbols by name pattern
  • get_callers - Find callers of a function
  • get_repositories - List indexed repositories
  • index_repository - Index a new repository

Semantic Search (v0.2.0+)

  • search_semantic - Natural language code search using embeddings
    • Example: "function that handles user authentication"
    • Returns relevant symbols ranked by semantic similarity

Development Memory (v0.3.0+)

  • start_dev_session - Start tracking development attempts
    • Persists across Claude conversations
    • Links attempts to specific tasks
  • track_attempt - Record coding attempts (edit, test, refactor)
    • Automatically captures errors and solutions
    • Builds knowledge base of what works/fails
  • check_approach - See if an approach has been tried before
    • Learn from past attempts
    • Avoid repeating mistakes
  • analyze_failure - Get insights from past failures
    • Find solutions to similar problems
    • See common error patterns
  • analyze_patterns - Analyze failure patterns
    • Identify problematic files
    • Get suggestions based on history

Visualization Tools

  • generate_module_diagram - Generate Mermaid dependency diagrams
  • generate_system_architecture - Visualize cross-repository relationships
  • trace_api_flow - Trace client-server communication paths

Architecture

Storage Backend

Thoth uses a hybrid storage approach:

  • SQLite (~/.thoth/index.db): Source of truth for structured data

    • symbols - Functions, classes, methods with location and parent relationships
    • imports - Import statements with cross-repository resolution
    • calls - Function call graph (caller → callee mapping)
    • files - File metadata and content hashes for incremental updates
    • development_sessions - Track coding sessions across Claude conversations
    • development_attempts - Record all edit/test/refactor attempts
    • failure_patterns - Identify common failure patterns
    • learned_solutions - Store successful solutions for reuse
  • ChromaDB (~/.thoth/chroma/): Vector storage for semantic search

    • Stores embeddings for all indexed symbols
    • Enables natural language queries
  • NetworkX: In-memory graph for fast relationship traversal

Embedding Model

Semantic search uses Qwen3-Embedding-0.6B via vLLM:

  • Lightweight (600M parameters, ~1.2GB on disk)
  • Code-aware embeddings with instruction support
  • Fast inference with GPU acceleration (optional)
  • Falls back to TF-IDF when vLLM is unavailable

Performance

  • Indexing: ~10K symbols/minute
  • Semantic Search: <100ms for typical queries
  • Memory: ~2GB for model + ~100MB per 100K symbols
  • Accuracy: 0.7-0.9 relevance scores for code search

Advanced Usage

Pre-indexing Large Repositories

For large monorepos, pre-index before adding to Claude:

thoth-cli index myrepo /path/to/large-repo

Using Redis Cache (Optional)

For improved performance with multiple users:

# Install with Redis support
uv tool install "mcp-server-thoth[cache]"

# Requires Redis server running locally

Dashboard (Coming Soon)

A separate thoth-dashboard package will provide:

  • Web UI for exploring indexed code
  • Interactive dependency graphs
  • Real-time search interface

Development

git clone https://github.com/braininahat/thoth
cd thoth
uv pip install -e ".[dev]"

# Run tests
pytest

# Type checking
mypy thoth

Token Efficiency

Thoth dramatically reduces the tokens needed for code navigation:

Without Thoth: Multiple searches + reading entire files = ~50K tokens With Thoth: Semantic search + precise results = ~2K tokens

Example:

User: "How does the dashboard update in real-time?"

Without Thoth:
- grep "dashboard" → 50 results
- grep "update" → 200 results  
- Read 10+ files to understand

With Thoth semantic search:
- Returns: WebSocketHandler.send_update(), Dashboard.subscribe_to_changes(), etc.
- Ranked by relevance

Troubleshooting

Python Version Issues

If you see errors about xformers or build failures:

# Ensure Python 3.12 is used
uvx --python 3.12 mcp-server-thoth

GPU Memory

For systems with limited GPU memory:

  • Embeddings are automatically moved to CPU after computation
  • Set CUDA_VISIBLE_DEVICES=-1 to force CPU-only mode

Model Download

First run downloads the embedding model (~460MB). Use thoth-cli init to pre-download:

# Download model before using with Claude
thoth-cli init

# Or skip model download (disables semantic search)
thoth-cli init --skip-model

MCP Timeouts

If tools timeout in Claude, run thoth-cli init first to pre-download the model. The embedding model takes time to load on first use.

License

MIT

Contributing

Contributions welcome! Please check the issues page.

Acknowledgments

  • MCP by Anthropic
  • vLLM for fast inference
  • Qwen for lightweight embeddings

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_server_thoth-0.3.3.tar.gz (264.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_server_thoth-0.3.3-py3-none-any.whl (53.7 kB view details)

Uploaded Python 3

File details

Details for the file mcp_server_thoth-0.3.3.tar.gz.

File metadata

  • Download URL: mcp_server_thoth-0.3.3.tar.gz
  • Upload date:
  • Size: 264.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.9

File hashes

Hashes for mcp_server_thoth-0.3.3.tar.gz
Algorithm Hash digest
SHA256 08c638d40c955d65143b27771212787c8cfb81266e6bb1bbdbff28debc152879
MD5 097ad709b19dbdfefe796c7acffbff49
BLAKE2b-256 ff0d470466e5a2893071ea02b67ab1713c82d712bf5d603e14c73df078582458

See more details on using hashes here.

File details

Details for the file mcp_server_thoth-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_server_thoth-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c649a38237dce4ade0092e066debbd172e1a71d7733564ebf4dd346eac11ebe2
MD5 416bc53341570cdd3cf965b0f170caf0
BLAKE2b-256 c82a31c342677c522a250c860112ebe12aeb8774d4a2a45c2bc8ca05e8b9e997

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page