Skip to main content

Turn internal code libraries into AI-accessible knowledge sources via MCP server with semantic search

Project description

SimpleCodeMCP

Turn your internal code libraries into AI-accessible knowledge sources.

SimpleCodeMCP is an open-source tool that indexes internal code libraries and exposes them through a Model Context Protocol (MCP) server. This enables AI coding agents (Claude, GitHub Copilot, etc.) to provide precise, context-aware assistance for company-internal libraries—even when documentation is sparse or outdated.

The Problem

In many organizations:

  • Team A builds internal libraries (e.g., pythonpackage1)
  • Team B uses these libraries to implement new software
  • Documentation is often incomplete, outdated, or missing
  • This leads to frequent misuse, implementation errors, and repeated questions

The Solution

SimpleCodeMCP scans and indexes your internal library's:

  • Public and internal APIs
  • Function/class signatures and type hints
  • Docstrings and comments
  • Tests (as usage examples)
  • Code structure and relationships

It then exposes this knowledge through an MCP server that AI agents can query to:

  • List available functions and classes
  • Inspect signatures and behavior
  • Retrieve real usage examples from tests
  • Search relevant parts of the codebase semantically

Architecture

┌─────────────────────────────────────────────────────────┐
│  Your Library Repository                                │
│  ├── src/           (source code)                       │
│  ├── tests/         (usage examples)                    │
│  └── examples/      (additional examples)               │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  Indexer Component   │
          │  ─────────────────   │
          │  • AST Parser        │  Extract structure
          │  • Docstring Parser  │  Extract documentation
          │  • Test Parser       │  Find usage patterns
          │  • Static Analyzer   │  Infer types & relationships
          └──────────┬───────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  Storage Layer       │
          │  ─────────────────   │
          │  • ChromaDB          │  Semantic search (embeddings)
          │  • Metadata Store    │  Signatures, paths, etc.
          └──────────┬───────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  MCP Server          │
          │  (FastAPI + MCP SDK) │
          │  ─────────────────   │
          │  Available Tools:    │
          │  • list_api          │
          │  • get_signature     │
          │  • search_code       │
          │  • get_examples      │
          │  • get_tests         │
          └──────────┬───────────┘
                     │
                     ▼
          ┌──────────────────────┐
          │  AI Agent (Client)   │
          │  Claude, Copilot,    │
          │  or any MCP client   │
          └──────────────────────┘

Features

Multi-Language Support

  • Python (MVP with full AST parsing, type inference)
  • C++ (planned)
  • JavaScript/TypeScript (planned)
  • Java (planned)
  • Go (planned)
  • Extensible architecture for additional languages

Multiple Embedding Providers

  • Local (sentence-transformers) - Free, offline, privacy-friendly
  • OpenAI - High quality, fast, cloud-based
  • Azure OpenAI - Enterprise support, data residency control

See EMBEDDING_PROVIDERS.md for detailed comparison and setup.

MCP Tools

The server exposes the following tools to AI agents:

list_api

Lists all available functions, classes, and modules in the library.

Parameters:

  • module (optional): Filter by specific module/namespace

Returns:

{
  "functions": ["calculate_total", "validate_email"],
  "classes": ["User", "Order"],
  "modules": ["core", "utils", "api"]
}

get_signature

Retrieves detailed signature information for a function or class.

Parameters:

  • name: Function or class name

Returns:

{
  "name": "calculate_total",
  "signature": "calculate_total(items: List[Item], tax_rate: float = 0.19) -> Decimal",
  "docstring": "Calculate the total price including tax...",
  "file": "src/billing.py",
  "line": 45,
  "parameters": [
    {"name": "items", "type": "List[Item]", "required": true},
    {"name": "tax_rate", "type": "float", "default": "0.19"}
  ],
  "return_type": "Decimal"
}

search_code

Semantic search across the codebase using natural language.

Parameters:

  • query: Natural language query (e.g., "How do I validate an email?")
  • limit (optional): Maximum results (default: 10)

Returns:

{
  "results": [
    {
      "name": "validate_email",
      "relevance_score": 0.92,
      "signature": "validate_email(email: str) -> bool",
      "docstring": "Validates email format using regex...",
      "file": "src/utils/validation.py"
    }
  ]
}

get_examples

Retrieves usage examples from tests and example files.

Parameters:

  • name: Function or class name

Returns:

{
  "examples": [
    {
      "source": "tests/test_billing.py",
      "code": "result = calculate_total(items=[item1, item2], tax_rate=0.19)\nassert result == Decimal('119.00')",
      "description": "Basic usage with two items"
    }
  ]
}

get_tests

Retrieves all tests related to a function or class.

Parameters:

  • name: Function or class name

Returns:

{
  "tests": [
    {
      "test_name": "test_calculate_total_with_default_tax",
      "file": "tests/test_billing.py",
      "line": 12,
      "code": "..."
    }
  ]
}

Installation

For Users

pip install simplecode-mcp

For Development

This project uses uv for fast Python package management:

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/yourusername/simplecode-mcp.git
cd simplecode-mcp

# Install dependencies and create virtual environment
uv sync

# Activate the virtual environment
source .venv/bin/activate  # On Unix/macOS
# or
.venv\Scripts\activate  # On Windows

Quick Start

1. Index Your Library

Create a configuration file simplecode_mcp.yaml:

library:
  name: "my-internal-lib"
  path: "/path/to/my-internal-lib"
  language: "python"  # python, c++, javascript, typescript, java, go
  include_private: true  # Index _internal functions too

indexing:
  trigger: "manual"  # manual | on_commit | watch
  embedding_model: "local"  # local (sentence-transformers) | openai

server:
  host: "localhost"
  port: 8000
  auth: null  # Optional: bearer_token for authentication

Index your library:

simplecode-mcp reindex

2. Start the MCP Server

simplecode-mcp serve

The server will start on http://localhost:8000.

3. Connect Your AI Agent

Add the MCP server to your agent's configuration:

For Claude Desktop (claude_desktop_config.json):

{
  "mcpServers": {
    "my-internal-lib": {
      "command": "simplecode-mcp",
      "args": ["serve"],
      "cwd": "/path/to/my-internal-lib"
    }
  }
}

For GitHub Copilot (mcp.json):

{
  "servers": {
    "my-internal-lib": {
      "url": "http://localhost:8000"
    }
  }
}

4. Use in Your IDE

Your AI agent can now answer questions like:

  • "How do I use the calculate_total function?"
  • "Show me examples of email validation"
  • "What parameters does the User class constructor take?"

Configuration Options

Indexing Triggers

  • manual: Run simplecode-mcp reindex manually
  • on_commit: Automatically reindex on git commits (via git hook)
  • watch: Watch for file changes and reindex automatically

Embedding Models

  • local: Use sentence-transformers (e.g., all-MiniLM-L6-v2)

    • Pros: No external API calls, works offline
    • Cons: Slower, lower quality for complex queries
  • openai: Use OpenAI's embedding API

    • Pros: Fast, high quality
    • Cons: Requires API key, not fully offline

Authentication

For internal company use, you can enable bearer token authentication:

server:
  auth:
    type: "bearer"
    token: "your-secret-token"

Use Cases

1. Library Owner Perspective

You maintain an internal Python package used by 10 teams. Instead of answering the same questions repeatedly:

  1. Run SimpleCodeMCP once on your library
  2. Share the MCP server endpoint with consumer teams
  3. Their AI agents can now answer questions about your library autonomously

2. Library Consumer Perspective

You're implementing a new feature using an unfamiliar internal library:

  1. Connect your AI agent to the library's MCP server
  2. Ask: "How do I authenticate with the internal API?"
  3. Get instant, accurate examples from the library's tests

3. Onboarding New Developers

New team members can explore internal libraries through their AI assistant without digging through outdated wikis or bothering senior developers.

Roadmap

MVP (v0.1)

  • Python support (AST parsing, docstrings, tests)
  • Manual indexing trigger
  • Local embedding model (sentence-transformers)
  • Basic MCP tools (list_api, get_signature, search_code, get_examples)
  • YAML configuration

Future Versions

  • Incremental indexing (only changed files)
  • C++ Support
  • JavaScript/TypeScript support
  • Git hook for automatic reindexing
  • File watcher mode
  • OpenAI embedding support
  • Advanced relevance scoring
  • Multi-version support (index v1.x and v2.x simultaneously)
  • Web UI for browsing indexed libraries
  • Integration with internal documentation systems

Contributing

Contributions are welcome! This project uses uv for dependency management.

Setup Development Environment

# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/yourusername/simplecode-mcp.git
cd simplecode-mcp
uv sync

# Run tests
uv run pytest

# Run the CLI in development mode
uv run simplecode-mcp --help

Areas We'd Love Help With

  • Support for additional languages (JS/TS, Java, Go, Rust)
  • Better test example extraction
  • Performance optimizations for large codebases
  • Alternative embedding models

License

MIT License - see LICENSE for details.

Why "SimpleCodeMCP"?

Because complex internal libraries deserve simple, accessible knowledge interfaces. No more outdated docs, no more digging through source code—just ask your AI agent.


Built for teams that move fast and break things (but want to break fewer things).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplecode_mcp-0.1.0.tar.gz (49.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simplecode_mcp-0.1.0-py3-none-any.whl (31.3 kB view details)

Uploaded Python 3

File details

Details for the file simplecode_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: simplecode_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 49.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for simplecode_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d315dccf8d9a4c6dbf94511b0effefcc2b5ac247cc31fb45b8fa5b81531e0936
MD5 ce18801009cc6049e585636b8e175ef4
BLAKE2b-256 718ab7f268f7933254f357a632fbd0b09f370b3ca1d5422914a982bfa4819920

See more details on using hashes here.

File details

Details for the file simplecode_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: simplecode_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 31.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.1

File hashes

Hashes for simplecode_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b3f118c97a25fd1010e386260d35832642a503a0ba72ab8c25e89e46d9dde820
MD5 ce3ddf710341819cb133692cf844d054
BLAKE2b-256 0e727a3e9ed1aac424ca374f3c5ceb9a620ddba20ab73ad1a7a4a5dc72a57e74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page