Skip to main content

A language-aware semantic code search MCP server with intelligent filtering and 9.3x better dependency analysis

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

SemanticScout 🔍

Please note: this is just an idea project to try and build something for use in non Augment Code world

I have yet to refactor lots of slop, and implement a bunch of key changes

Language-aware semantic code search for AI agents withdependency analysis

Version Tests Coverage Python License

SemanticScout is a Model Context Protocol (MCP) server that provides intelligent code search for AI agents. It combines semantic search with language-aware analysis to understand code relationships, dependencies, and architecture.

✨ Key Features

  • 🎯 Language-Aware Analysis - Automatic language detection with specialized dependency analysis (Rust, C#, Python, etc.)
  • 🔍 Semantic Code Search - Natural language queries with 100% accuracy and intelligent context expansion
  • 🚫 Smart Test Filtering - Automatically excludes test files (0% test pollution) with multi-strategy detection
  • 🗂️ Git Integration - Smart filtering of untracked files and incremental indexing (5-10x faster updates)
  • 🧠 Hybrid Retrieval - Combines semantic, symbol, and dependency-based search with AST parsing
  • High Performance - Local embeddings (sentence-transformers), <100ms queries, <2s per file indexing
  • 🌐 Multi-Language - TypeScript, JavaScript, Python, Java, C#, Go, Rust, Ruby, PHP, C, C++
  • 🤖 MCP Ready - Works with Claude Desktop and other MCP clients out of the box

🚀 Quick Start

Get started in under 2 minutes with zero configuration required!

Prerequisites

Setup

  1. Configure Claude Desktop - Add to your MCP configuration file:

Windows: %APPDATA%\Claude\claude_desktop_config.json Mac: ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"]
    }
  }
}
  1. Restart Claude Desktop - SemanticScout will be automatically downloaded and ready to use!

✨ What you get:

  • Language-aware analysis with automatic project detection
  • Fast local embeddings (sentence-transformers, no Ollama needed)
  • Smart test file filtering and git integration
  • All data stored in ~/semanticscout/

Note: Use Python 3.12 for best compatibility. Some dependencies don't yet support Python 3.13.

📖 Usage

Once configured, use natural language to interact with SemanticScout through Claude:

Example Conversations

Index a codebase:

You: "Index my codebase at /workspace"
Claude: [Calls index_codebase tool and shows indexing progress]

Search for code:

You: "Find the authentication logic"
Claude: [Calls search_code tool and shows relevant code snippets]

Advanced queries:

You: "Show me dependency injection configuration"
Claude: [Automatically detects architectural query and expands coverage]

Available Tools

Tool Description Key Parameters
index_codebase Index a codebase with language-aware analysis path, incremental
search_code Search with natural language + smart filtering query, collection_name, exclude_test_files
find_symbol Find symbols with language-aware lookup symbol_name, collection_name
trace_dependencies Trace dependency chains file_path, collection_name, depth
list_collections List all indexed codebases None

Advanced Features

  • Incremental Indexing: Use incremental=True for 5-10x faster updates on existing codebases
  • Test Filtering: Set exclude_test_files=False to include test files in search results
  • Coverage Modes: Use coverage_mode for different result depths (focused/balanced/comprehensive/exhaustive)
  • Real-time Updates: Process file change events from editors automatically

🔧 Configuration

Default Setup (Recommended)

The default configuration works great for most users - no additional setup needed!

Custom Embedding Models

To use a different sentence-transformers model:

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"sentence-transformers\",\"model\":\"all-mpnet-base-v2\"}}"
      }
    }
  }
}

Ollama (Optional - GPU Acceleration)

For GPU acceleration with Ollama:

# Start Ollama and pull model
ollama serve
ollama pull nomic-embed-text
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "nomic-embed-text",
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"ollama\"}}"
      }
    }
  }
}

🐛 Troubleshooting

Common Issues

Python Version Error: Use Python 3.12 for best compatibility (some dependencies don't support 3.13 yet)

Ollama Not Available: The default uses sentence-transformers (no Ollama needed). Only configure Ollama if you want GPU acceleration.

Rate Limits: Adjust limits with environment variables:

"env": {
  "MAX_INDEXING_REQUESTS_PER_HOUR": "20",
  "MAX_SEARCH_REQUESTS_PER_MINUTE": "200"
}

📚 Documentation

🏗️ Architecture

SemanticScout combines multiple technologies for intelligent code search:

  • Language DetectionAST Parsing (tree-sitter) → Symbol Extraction
  • Semantic ChunkingEmbeddings (sentence-transformers/Ollama) → Vector Storage (ChromaDB)
  • Dependency AnalysisGraph Storage (NetworkX) → Symbol Tables (SQLite)
  • Hybrid SearchContext ExpansionSmart Filtering

🤝 Contributing

Contributions welcome! See our contributing guide for details.

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

📄 License

MIT License - see LICENSE for details.


Built with ❤️ for the AI agent ecosystem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticscout-3.1.4.tar.gz (167.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semanticscout-3.1.4-py3-none-any.whl (158.1 kB view details)

Uploaded Python 3

File details

Details for the file semanticscout-3.1.4.tar.gz.

File metadata

  • Download URL: semanticscout-3.1.4.tar.gz
  • Upload date:
  • Size: 167.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for semanticscout-3.1.4.tar.gz
Algorithm Hash digest
SHA256 7059810a896b57b7c17e0e03cdac41d2d8f5e2b7b52fe506b9c4b993422f2ce0
MD5 76a50783d89401a4ccadac1d7ea7e82f
BLAKE2b-256 3f00287769a2bfd98cb813c8fa2e95b2b79270e665815c6026980ca450082869

See more details on using hashes here.

File details

Details for the file semanticscout-3.1.4-py3-none-any.whl.

File metadata

  • Download URL: semanticscout-3.1.4-py3-none-any.whl
  • Upload date:
  • Size: 158.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for semanticscout-3.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 852bdc3a7f05f23a6a2f2e3e63721e8a59254bc7cf2e15ab69669704eb38c422
MD5 d78c8175ed1330a3d51cbd5060d5e10b
BLAKE2b-256 0facc272abfcea734a456ffc5347f1ddb0380c8ab4d941bc1a76df2f46f32d94

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page