Skip to main content

A language-aware semantic code search MCP server with intelligent filtering and 9.3x better dependency analysis

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

SemanticScout 🔍

Please note: this is just an idea project to try and build something for use in non Augment Code world

I have yet to refactor lots of slop, and implement a bunch of key changes

Language-aware semantic code search for AI agents withdependency analysis

Version Tests Coverage Python License

SemanticScout is a Model Context Protocol (MCP) server that provides intelligent code search for AI agents. It combines semantic search with language-aware analysis to understand code relationships, dependencies, and architecture.

✨ Key Features

  • 🎯 Language-Aware Analysis - Automatic language detection with specialized dependency analysis (Rust, C#, Python, etc.)
  • 🔍 Semantic Code Search - Natural language queries with 100% accuracy and intelligent context expansion
  • 🚫 Smart Test Filtering - Automatically excludes test files (0% test pollution) with multi-strategy detection
  • 🗂️ Git Integration - Smart filtering of untracked files and incremental indexing (5-10x faster updates)
  • 🧠 Hybrid Retrieval - Combines semantic, symbol, and dependency-based search with AST parsing
  • High Performance - Local embeddings (sentence-transformers), <100ms queries, <2s per file indexing
  • 🌐 Multi-Language - TypeScript, JavaScript, Python, Java, C#, Go, Rust, Ruby, PHP, C, C++
  • 🤖 MCP Ready - Works with Claude Desktop and other MCP clients out of the box

🚀 Quick Start

Get started in under 2 minutes with zero configuration required!

Prerequisites

Setup

  1. Configure Claude Desktop - Add to your MCP configuration file:

Windows: %APPDATA%\Claude\claude_desktop_config.json Mac: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"]
    }
  }
}
  1. Restart Claude Desktop - SemanticScout will be automatically downloaded and ready to use!

✨ What you get:

  • Language-aware analysis with automatic project detection
  • Fast local embeddings (sentence-transformers, no Ollama needed)
  • Smart test file filtering and git integration
  • All data stored in ~/semanticscout/

Note: Use Python 3.12 for best compatibility. Some dependencies don't yet support Python 3.13.

📖 Usage

Once configured, use natural language to interact with SemanticScout through Claude:

Example Conversations

Index a codebase:

You: "Index my codebase at /workspace"
Claude: [Calls index_codebase tool and shows indexing progress]

Search for code:

You: "Find the authentication logic"
Claude: [Calls search_code tool and shows relevant code snippets]

Advanced queries:

You: "Show me dependency injection configuration"
Claude: [Automatically detects architectural query and expands coverage]

Available Tools

Tool Description Key Parameters
index_codebase Index a codebase with language-aware analysis path, incremental
search_code Search with natural language + smart filtering query, collection_name, exclude_test_files
find_symbol Find symbols with language-aware lookup symbol_name, collection_name
trace_dependencies Trace dependency chains file_path, collection_name, depth
list_collections List all indexed codebases None

Advanced Features

  • Incremental Indexing: Use incremental=True for 5-10x faster updates on existing codebases
  • Test Filtering: Set exclude_test_files=False to include test files in search results
  • Coverage Modes: Use coverage_mode for different result depths (focused/balanced/comprehensive/exhaustive)
  • Real-time Updates: Process file change events from editors automatically

🔧 Configuration

Default Setup (Recommended)

The default configuration works great for most users - no additional setup needed!

Custom Embedding Models

To use a different sentence-transformers model:

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"sentence-transformers\",\"model\":\"all-mpnet-base-v2\"}}"
      }
    }
  }
}

Ollama (Optional - GPU Acceleration)

For GPU acceleration with Ollama:

# Start Ollama and pull model
ollama serve
ollama pull nomic-embed-text
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "nomic-embed-text",
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"ollama\"}}"
      }
    }
  }
}

🐛 Troubleshooting

Common Issues

Python Version Error: Use Python 3.12 for best compatibility (some dependencies don't support 3.13 yet)

Ollama Not Available: The default uses sentence-transformers (no Ollama needed). Only configure Ollama if you want GPU acceleration.

Rate Limits: Adjust limits with environment variables:

"env": {
  "MAX_INDEXING_REQUESTS_PER_HOUR": "20",
  "MAX_SEARCH_REQUESTS_PER_MINUTE": "200"
}

📚 Documentation

🏗️ Architecture

SemanticScout combines multiple technologies for intelligent code search:

  • Language DetectionAST Parsing (tree-sitter) → Symbol Extraction
  • Semantic ChunkingEmbeddings (sentence-transformers/Ollama) → Vector Storage (ChromaDB)
  • Dependency AnalysisGraph Storage (NetworkX) → Symbol Tables (SQLite)
  • Hybrid SearchContext ExpansionSmart Filtering

🤝 Contributing

Contributions welcome! See our contributing guide for details.

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

📄 License

MIT License - see LICENSE for details.


Built with ❤️ for the AI agent ecosystem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticscout-3.0.3a0.tar.gz (167.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semanticscout-3.0.3a0-py3-none-any.whl (158.1 kB view details)

Uploaded Python 3

File details

Details for the file semanticscout-3.0.3a0.tar.gz.

File metadata

  • Download URL: semanticscout-3.0.3a0.tar.gz
  • Upload date:
  • Size: 167.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for semanticscout-3.0.3a0.tar.gz
Algorithm Hash digest
SHA256 103d3bfbe3802fa2ef7d9421b66e89635d5bda3ad78303f2b2a76694771d0cdb
MD5 1699674e45630c1c65f157ad2966e8f5
BLAKE2b-256 64f5ba3bb2cd71273a76ef3d5599b65d6948bd5a0464315e45843af5740fbc85

See more details on using hashes here.

File details

Details for the file semanticscout-3.0.3a0-py3-none-any.whl.

File metadata

File hashes

Hashes for semanticscout-3.0.3a0-py3-none-any.whl
Algorithm Hash digest
SHA256 850b7945e8c7d6d94e081b70910e282da88c7aeac69663822bbdbdc359a1083f
MD5 bc20163707105a7c45f5ee6c4d3d92e9
BLAKE2b-256 cc554ac89e8a419ddc1f26511c87b7069c1029f7928c15c97c4db77338c68176

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page