Skip to main content

MCP server for semantic code search using local embeddings

Project description

Semantic Search MCP Server

An MCP server that provides semantic code search using local embeddings. Search your codebase with natural language queries like "authentication middleware" or "database connection pooling".

Features

  • Hybrid search: Combines vector similarity (Jina code embeddings) with FTS5 keyword matching using Reciprocal Rank Fusion
  • 30+ languages: Tree-sitter parsing for Python, TypeScript, JavaScript, Go, Rust, Java, C/C++, Ruby, PHP, and more
  • Incremental indexing: File watcher automatically detects additions, modifications, and deletions
  • Respects .gitignore: Honors your project's .gitignore files (including nested ones)
  • Auto-initialization: Model loads and codebase indexes in the background on server startup
  • Zero external APIs: All embeddings generated locally with FastEmbed

Installation

uvx install semantic-search-mcp

Or with pip:

pip install semantic-search-mcp

Quick Start

Add to Claude Code

Option A: Project-level config

Create .mcp.json in your project root:

{
  "mcpServers": {
    "semantic-search": {
      "command": "uvx",
      "args": ["semantic-search-mcp"]
    }
  }
}

Option B: CLI

claude mcp add semantic-search -- uvx semantic-search-mcp

Use

The server auto-initializes on startup. Available tools:

  • search_code - Search with natural language queries
  • initialize - Force re-index if needed
  • reindex_file - Manually reindex a specific file

How It Works

Indexing

On startup, the server:

  1. Scans your codebase for supported file types
  2. Parses code into semantic chunks (functions, classes, methods) using Tree-sitter
  3. Generates embeddings for each chunk using Jina's code embedding model
  4. Stores everything in a local SQLite database with vector search support

File Watching

The server monitors your codebase for changes in real-time:

Event Action
File created Parsed, embedded, and added to index
File modified Re-indexed if content hash changed
File deleted Removed from index

Changes are debounced (default 1s) to batch rapid modifications.

What Gets Indexed

Included:

  • Files with code extensions: .py, .js, .ts, .tsx, .jsx, .go, .rs, .java, .c, .cpp, .h, .rb, .php, .swift, .kt, .scala, and more

Excluded:

  • Files matching .gitignore patterns (all .gitignore files in your project are respected)
  • Common non-code directories: node_modules, __pycache__, .venv, build, dist, .git, vendor, etc.
  • Binary files and non-code file types

Configuration

Environment variables:

Variable Default Description
SEMANTIC_SEARCH_DB_PATH .semantic-search/index.db Index database location
SEMANTIC_SEARCH_EMBEDDING_MODEL jinaai/jina-embeddings-v2-base-code Embedding model
SEMANTIC_SEARCH_MIN_SCORE 0.3 Minimum relevance threshold (0-1)
SEMANTIC_SEARCH_DEBOUNCE_MS 1000 File watcher debounce in milliseconds

Requirements

  • Python 3.11+
  • ~700MB disk for embedding model (downloaded on first run)
  • ~1GB RAM for embedding model

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_search_mcp-0.1.0.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_search_mcp-0.1.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file semantic_search_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: semantic_search_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for semantic_search_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b1454a179f8051b16c81da7bc19ce6e7b2b7a9fab7a148a4cce2206f05fefe3c
MD5 87fd08e397a2b4cc204cebaae11c316f
BLAKE2b-256 6d521e7514eb80e5ad5b0d37bbadf1dc16856258b40133b37244dd2ba5e5855c

See more details on using hashes here.

File details

Details for the file semantic_search_mcp-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_search_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 75aba1656da90302a6cc049a746e4e3bc04f25f7f5fc7b33cc9e11fe82745492
MD5 3ea375a9c2a02e7ba94b8f4f550b0499
BLAKE2b-256 3f54233207a2d44fe02fe77d7dc769f77067df3392c3ec7853b3265e7a2ea2aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page