A language-aware semantic code search MCP server with intelligent filtering and 9.3x better dependency analysis
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
SemanticScout 🔍
Please note: this is just an idea project to try and build something for use in non Augment Code world
I have yet to refactor lots of slop, and implement a bunch of key changes
Language-aware semantic code search for AI agents withdependency analysis
SemanticScout is a Model Context Protocol (MCP) server that provides intelligent code search for AI agents. It combines semantic search with language-aware analysis to understand code relationships, dependencies, and architecture.
✨ Key Features
- 🎯 Language-Aware Analysis - Automatic language detection with specialized dependency analysis (Rust, C#, Python, etc.)
- 🔍 Semantic Code Search - Natural language queries with 100% accuracy and intelligent context expansion
- 🚫 Smart Test Filtering - Automatically excludes test files (0% test pollution) with multi-strategy detection
- 🗂️ Git Integration - Smart filtering of untracked files and incremental indexing (5-10x faster updates)
- 🧠 Hybrid Retrieval - Combines semantic, symbol, and dependency-based search with AST parsing
- ⚡ High Performance - Local embeddings (sentence-transformers), <100ms queries, <2s per file indexing
- 🌐 Multi-Language - TypeScript, JavaScript, Python, Java, C#, Go, Rust, Ruby, PHP, C, C++
- 🤖 MCP Ready - Works with Claude Desktop and other MCP clients out of the box
🚀 Quick Start
Get started in under 2 minutes with zero configuration required!
Prerequisites
- uv - Install uv
- Claude Desktop - Install Claude Desktop
Setup
- Configure Claude Desktop - Add to your MCP configuration file:
Windows: %APPDATA%\Claude\claude_desktop_config.json
Mac: ~/Library/Application Support/Claude/claude_desktop_config.json
{
"mcpServers": {
"semanticscout": {
"command": "uvx",
"args": ["--python", "3.12", "semanticscout@latest"]
}
}
}
- Restart Claude Desktop - SemanticScout will be automatically downloaded and ready to use!
✨ What you get:
- Language-aware analysis with automatic project detection
- Fast local embeddings (sentence-transformers, no Ollama needed)
- Smart test file filtering and git integration
- All data stored in
~/semanticscout/
Note: Use Python 3.12 for best compatibility. Some dependencies don't yet support Python 3.13.
📖 Usage
Once configured, use natural language to interact with SemanticScout through Claude:
Example Conversations
Index a codebase:
You: "Index my codebase at /workspace"
Claude: [Calls index_codebase tool and shows indexing progress]
Search for code:
You: "Find the authentication logic"
Claude: [Calls search_code tool and shows relevant code snippets]
Advanced queries:
You: "Show me dependency injection configuration"
Claude: [Automatically detects architectural query and expands coverage]
Available Tools
| Tool | Description | Key Parameters |
|---|---|---|
index_codebase |
Index a codebase with language-aware analysis | path, incremental |
search_code |
Search with natural language + smart filtering | query, collection_name, exclude_test_files |
find_symbol |
Find symbols with language-aware lookup | symbol_name, collection_name |
trace_dependencies |
Trace dependency chains | file_path, collection_name, depth |
list_collections |
List all indexed codebases | None |
Advanced Features
- Incremental Indexing: Use
incremental=Truefor 5-10x faster updates on existing codebases - Test Filtering: Set
exclude_test_files=Falseto include test files in search results - Coverage Modes: Use
coverage_modefor different result depths (focused/balanced/comprehensive/exhaustive) - Real-time Updates: Process file change events from editors automatically
🔧 Configuration
Default Setup (Recommended)
The default configuration works great for most users - no additional setup needed!
Custom Embedding Models
To use a different sentence-transformers model:
{
"mcpServers": {
"semanticscout": {
"command": "uvx",
"args": ["--python", "3.12", "semanticscout@latest"],
"env": {
"SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"sentence-transformers\",\"model\":\"all-mpnet-base-v2\"}}"
}
}
}
}
Ollama (Optional - GPU Acceleration)
For GPU acceleration with Ollama:
# Start Ollama and pull model
ollama serve
ollama pull nomic-embed-text
{
"mcpServers": {
"semanticscout": {
"command": "uvx",
"args": ["--python", "3.12", "semanticscout@latest"],
"env": {
"OLLAMA_BASE_URL": "http://localhost:11434",
"OLLAMA_MODEL": "nomic-embed-text",
"SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"ollama\"}}"
}
}
}
}
🐛 Troubleshooting
Common Issues
Python Version Error: Use Python 3.12 for best compatibility (some dependencies don't support 3.13 yet)
Ollama Not Available: The default uses sentence-transformers (no Ollama needed). Only configure Ollama if you want GPU acceleration.
Rate Limits: Adjust limits with environment variables:
"env": {
"MAX_INDEXING_REQUESTS_PER_HOUR": "20",
"MAX_SEARCH_REQUESTS_PER_MINUTE": "200"
}
📚 Documentation
- API Reference - Complete tool documentation
- User Guide - Examples and best practices
- Configuration - Advanced configuration options
- Performance Tuning - Optimization guide
🏗️ Architecture
SemanticScout combines multiple technologies for intelligent code search:
- Language Detection → AST Parsing (tree-sitter) → Symbol Extraction
- Semantic Chunking → Embeddings (sentence-transformers/Ollama) → Vector Storage (ChromaDB)
- Dependency Analysis → Graph Storage (NetworkX) → Symbol Tables (SQLite)
- Hybrid Search → Context Expansion → Smart Filtering
🤝 Contributing
Contributions welcome! See our contributing guide for details.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
📄 License
MIT License - see LICENSE for details.
Built with ❤️ for the AI agent ecosystem
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file semanticscout-3.3.3.tar.gz.
File metadata
- Download URL: semanticscout-3.3.3.tar.gz
- Upload date:
- Size: 180.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
215deb07bc3daa7585572bfb428a551bdba5c3c2b832b83c66264931933a4940
|
|
| MD5 |
319ea18e0cf3f3b2e539f241888352f1
|
|
| BLAKE2b-256 |
881d0462bd286d258002063392b38cfc17a40a4f95ef7cf7bb24e59d93d40045
|
File details
Details for the file semanticscout-3.3.3-py3-none-any.whl.
File metadata
- Download URL: semanticscout-3.3.3-py3-none-any.whl
- Upload date:
- Size: 170.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4057d4e0c361e156e5f98be89cefcc96457ebd3bc7ca18b051e7c38cfdb5b38
|
|
| MD5 |
2552b4da8c61a7b8b0282f179b35e36f
|
|
| BLAKE2b-256 |
04b47fe3ec35d1407608b1aafd87ef3a23a39f99db6fed8bc83f45f06088ad9d
|