Skip to main content

MCP server for semantic code search using local embeddings

Project description

Semantic Search MCP Server

An MCP server that provides semantic code search using local embeddings. Search your codebase with natural language queries like "authentication middleware" or "database connection pooling".

Features

  • Hybrid search: Combines vector similarity (Jina code embeddings) with FTS5 keyword matching using Reciprocal Rank Fusion
  • 165+ languages: Tree-sitter parsing for Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, PHP, and more
  • Incremental indexing: File watcher automatically detects additions, modifications, and deletions
  • Respects .gitignore: Honors your project's .gitignore files (including nested ones)
  • Auto-initialization: Model loads and codebase indexes in the background on server startup
  • Zero external APIs: All embeddings generated locally with FastEmbed

Installation

uv tool install semantic-search-mcp

Or with pip:

pip install semantic-search-mcp

Or run directly without installing:

uvx semantic-search-mcp

Quick Start

Add to Claude Code

Option A: Project-level config

Create .mcp.json in your project root:

{
  "mcpServers": {
    "semantic-search": {
      "command": "uvx",
      "args": ["semantic-search-mcp"]
    }
  }
}

Option B: CLI

claude mcp add semantic-search -- uvx semantic-search-mcp

Use

The server auto-initializes on startup. Available tools:

  • search_code - Search with natural language queries
  • initialize - Force re-index if needed
  • reindex_file - Manually reindex a specific file

How It Works

Indexing

On startup, the server:

  1. Scans your codebase for supported file types
  2. Parses code into semantic chunks (functions, classes, methods) using Tree-sitter
  3. Generates embeddings for each chunk using Jina's code embedding model
  4. Stores everything in a local SQLite database with vector search support

File Watching

The server monitors your codebase for changes in real-time:

Event Action
File created Parsed, embedded, and added to index
File modified Re-indexed if content hash changed
File deleted Removed from index

Changes are debounced (default 1s) to batch rapid modifications.

What Gets Indexed

Included:

  • Files with code extensions: .py, .js, .ts, .tsx, .jsx, .go, .rs, .java, .c, .cpp, .h, .rb, .php, .swift, .kt, .scala, and more

Excluded:

  • Files matching .gitignore patterns (all .gitignore files in your project are respected)
  • Common non-code directories: node_modules, __pycache__, .venv, build, dist, .git, vendor, etc.
  • Binary files and non-code file types

Configuration

Environment variables:

Variable Default Description
SEMANTIC_SEARCH_DB_PATH .semantic-search/index.db Index database location
SEMANTIC_SEARCH_EMBEDDING_MODEL jinaai/jina-embeddings-v2-base-code Embedding model
SEMANTIC_SEARCH_MIN_SCORE 0.3 Minimum relevance threshold (0-1)
SEMANTIC_SEARCH_DEBOUNCE_MS 1000 File watcher debounce in milliseconds

Requirements

  • Python 3.11+
  • ~700MB disk for embedding model (downloaded on first run)
  • ~1GB RAM for embedding model

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_search_mcp-0.2.0.tar.gz (152.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_search_mcp-0.2.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file semantic_search_mcp-0.2.0.tar.gz.

File metadata

  • Download URL: semantic_search_mcp-0.2.0.tar.gz
  • Upload date:
  • Size: 152.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for semantic_search_mcp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 bd97bbd1a597ba7699a950211e48538d373b1f869962cd67c10238ca6adda4bc
MD5 5f7ceb10440cc7932aaf23b4312cbf19
BLAKE2b-256 69549a1f1ca351766e9ba4feef678de5a7644e06f0658e73a55caefcbcb33749

See more details on using hashes here.

File details

Details for the file semantic_search_mcp-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_search_mcp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2168264f3cfb3930dfb5f454745a2e54cffed35e7c4dc5fe2576e615f0794a12
MD5 b4e74108b94a14465411d1afb48f8384
BLAKE2b-256 b1194160307c30dfd08c8d789b417a0d3a05f67f763aed3474d1345ffd12c323

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page