Skip to main content

A Model Context Protocol (MCP) server for semantic search and Retrieval-Augmented Generation (RAG) over local codebases and documents.

Project description

🔍 Source-MCP

A Model Context Protocol (MCP) server for semantic search and Retrieval-Augmented Generation (RAG) over local codebases and documents.


📖 Overview

Source-MCP leverages the Model Context Protocol to provide AI assistants (like Claude, Gemini, and others) with direct access to local files through semantic search.

Instead of manually copy-pasting code or documentation into your prompts, Source-MCP automatically indexes your local repository, generates vector embeddings, and enables the AI to semantically search and retrieve only the most relevant files.

✨ Key Features

  • Dual Embedding Support:
    • OpenAI: Uses robust text-embedding-3-small (1536 dimensions) for high-quality enterprise embeddings.
    • FastEmbed (Local): Uses sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 (384 dims). Runs entirely locally, no API keys required, and supports multilingual inquiries.
  • Smart Incremental Indexing: Uses file fingerprints (modified time + size) to only index new or modified files, ensuring lightning-fast startup times.
  • Auto-Migration: Automatically detects embedding dimension changes (e.g., switching from OpenAI to FastEmbed) and safely recreates the vector index.
  • Web Dashboard (Port 8000):
    • Live Logs: View real-time indexing and search activity with auto-scroll.
    • Reindex Base: Force-wipe the vector DB and manifest for a completely fresh full scan.
    • Reindex Base: Force-wipe the vector DB and manifest for a completely fresh full scan.
    • Search Debugging: Special endpoint (/api/search/debug?q=...) to test raw semantic search scores.

🤔 Why local embeddings and zvec?

We use zvec, a lightweight, high-performance vector database maintained by Alibaba. zvec is embedded directly into the Python process, eliminating the need to set up or run external vector servers (like Pinecone, Milvus, or Qdrant). Combined with FastEmbed, this allows Source-MCP to build the entire semantic search pipeline fully offline, quickly, and entirely on your local machine.

🚀 Installation & Setup

  1. Prerequisites: Ensure you have Python 3.10+ and uv installed.

  2. Clone the repository:

    git clone https://github.com/AlexShimmy/source-mcp.git
    cd source-mcp
    
  3. Install Dependencies:

    # uv will automatically handle virtual environment creation and dependencies
    uv sync
    

⚙️ Configuration

Create a .env file in the root directory (you can copy .env.example if available).

# Choose your provider: "openai" or "fastembed"
EMBEDDING_PROVIDER=openai

# Required ONLY if using OpenAI
# Required ONLY if using OpenAI
OPENAI_API_KEY=sk-your-openai-api-key

# Optional: Path to store the vector database (Defaults to `.source-mcp/zvec_db` in the index dir)
ZVEC_PATH=./zvec_db

# Optional: Which directory to index (Defaults to current directory)
SOURCE_MCP_INDEX_DIR=/path/to/your/project

# Optional: Port for the Web Dashboard (Defaults to 8000)
WEB_PORT=8000

🖱️ Usage

Running Manually (Terminal & Dashboard)

To start the MCP server manually and access the web dashboard:

uv run python -m src.main --path .

🔌 MCP Configuration

The config is the same for all clients (Claude Desktop, Cursor, VS Code / Cline, etc.):

{
  "mcpServers": {
    "source-mcp": {
      "command": "uv",
      "args": ["--directory", "/absolute/path/to/source-mcp", "run", "python", "-m", "src.main"]
    }
  }
}

All other settings (such as SOURCE_MCP_INDEX_DIR, EMBEDDING_PROVIDER or OPENAI_API_KEY) should be configured via the .env file in the root directory of Source-MCP.

🧪 Testing

The project uses pytest for unit and end-to-end tests. To run the test suite:

uv run python -m pytest tests/ -v

📜 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

source_mcp-0.1.3b5.tar.gz (120.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

source_mcp-0.1.3b5-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file source_mcp-0.1.3b5.tar.gz.

File metadata

  • Download URL: source_mcp-0.1.3b5.tar.gz
  • Upload date:
  • Size: 120.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.17

File hashes

Hashes for source_mcp-0.1.3b5.tar.gz
Algorithm Hash digest
SHA256 7f4d7592e69a2489a30da5be7cdc77277e51f4cf17d3bfe753165562d09cce07
MD5 79e8a7deeeb2fed4c38a4e5d10075593
BLAKE2b-256 1a3d903b7121d9dceff7d8f1986ed6b4801a62efc5c69d6c93647c93a48204a3

See more details on using hashes here.

File details

Details for the file source_mcp-0.1.3b5-py3-none-any.whl.

File metadata

File hashes

Hashes for source_mcp-0.1.3b5-py3-none-any.whl
Algorithm Hash digest
SHA256 16ba0b46a1f3f4b7e5ecde640e5468e145fac00eb773722d95f2f4eb4c3c77a9
MD5 b6402d57246ff7b7f8a28bf21516802b
BLAKE2b-256 31cdb4934365d0aada8eec544bb64743d3ee8ccf3c6d66017e3b7d136e90551d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page