Skip to main content

A language-aware semantic code search MCP server with intelligent filtering and 9.3x better dependency analysis

This project has been archived.

The maintainers of this project have marked this project as archived. No new releases are expected.

Project description

SemanticScout ๐Ÿ”

A hybrid code intelligence system for AI agents - combining semantic search with structural understanding

Version Tests Coverage Python License

SemanticScout is a Model Context Protocol (MCP) server that provides hybrid code intelligence by combining semantic search with structural code understanding. It goes beyond simple text matching to understand code relationships, dependencies, and architecture with language-aware analysis and intelligent filtering.

๐ŸŽ‰ What's New in v2.7.0

๐ŸŽฏ Language-Aware Dependency Analysis - 9.3x Better Accuracy!

  • โœ… Project Language Detection - Automatically detects primary languages (Rust, C#, Python, etc.)
  • โœ… Language-Specific Routing - Routes dependency analysis to specialized strategies
  • โœ… Rust Support - Advanced Cargo.toml parsing, mod declarations, crate resolution
  • โœ… C# Support - Namespace resolution, using statements, project references
  • โœ… Python Support - Import analysis, package detection, module resolution
  • โœ… 9.3x Improvement - 100% accuracy vs 10.7% with generic analysis

๐Ÿšซ Intelligent Test Code Filtering - 0% Test Pollution!

  • โœ… Multi-Strategy Detection - Path patterns, file names, AST analysis
  • โœ… Production Code Focus - Automatically excludes test files from search results
  • โœ… Configurable Filtering - Enable/disable via exclude_test_files parameter
  • โœ… Zero False Positives - Comprehensive test detection patterns
  • โœ… 24% โ†’ 0% Test Pollution - Eliminates irrelevant test code from results

๐Ÿ—‚๏ธ Enhanced Git Filtering - Massive Project Support!

  • โœ… Untracked File Detection - Automatically excludes untracked files from indexing
  • โœ… Performance Optimization - 30-second caching of git status results
  • โœ… Configurable Filtering - Enable/disable untracked file filtering
  • โœ… Massive Project Support - Handles large repositories efficiently
  • โœ… Graceful Fallbacks - Works with non-Git repositories

๐Ÿ—๏ธ Architectural Query Detection - Smart Pattern Recognition!

  • โœ… DI Pattern Detection - Recognizes dependency injection queries
  • โœ… Result Boosting - Prioritizes architectural files (Startup.cs, Program.cs)
  • โœ… Context Expansion - Intelligent expansion for architectural queries
  • โœ… Coverage Modes - Focused (5), Balanced (10), Comprehensive (20), Exhaustive (50)
  • โœ… File-Level Deduplication - Eliminates duplicate results from same files

Performance Comparison: Language-Aware vs Generic Analysis

Metric Generic Analysis Language-Aware Improvement
Accuracy 10.7% 100% 9.3x better
Test Pollution 24% 0% Eliminated
Duplicate Results 15% 0% Eliminated
Coverage 3-5 files 10-20 files 2-4x more

๐ŸŽ‰ Previous Major Features

๐Ÿง  LSP Integration (v2.4.0) - 7% More Accurate Symbol Extraction!

  • โœ… Language Server Protocol (LSP) - Uses real language servers for symbol extraction (default)
  • โœ… Multi-Language Support - Python (jedi), C# (omnisharp), TypeScript/JavaScript (tsserver)
  • โœ… Intelligent Fallback - Automatically falls back to tree-sitter if LSP unavailable
  • โœ… Session-Based Lifecycle - Servers stay alive for entire MCP session (no startup overhead)

๐Ÿš€ Incremental Indexing (v2.2.0) - 5-10x Faster Updates!

  • โœ… Incremental Indexing - Only indexes changed files (5-10x speedup for small changes)
  • โœ… Chunk-Level Granularity - Only re-embeds changed code chunks (50%+ reuse rate)
  • โœ… Parallel Processing - Async parallel updates with 4x+ speedup
  • โœ… Hybrid Change Detection - Automatic Git-based or hash-based detection
  • โœ… Model Switching - Reuse indexes when switching embedding models (if dimensions match)
  • โœ… Real-Time Updates - Process file change events from editors via MCP

โœจ Features

Core Capabilities

  • ๐Ÿ” Semantic Code Search - Find code using natural language queries with 100% accuracy
  • ๐ŸŽฏ Symbol Resolution - Precise function/class/variable lookup (95%+ accuracy)
  • ๐Ÿ”— Language-Aware Dependencies - Understand code relationships with specialized analysis (9.3x better)
  • ๐Ÿง  Hybrid Retrieval - Combines semantic, symbol, and dependency-based search
  • ๐Ÿ“Š Context Expansion - Intelligent code context with dependency awareness
  • ๐Ÿšซ Test Code Filtering - Automatically excludes test files (0% test pollution)
  • ๐Ÿ—‚๏ธ Git Integration - Smart filtering of untracked files and git-aware indexing

Technical Features

  • ๐ŸŽฏ Language Detection - Automatic project language detection and specialized routing
  • ๐Ÿง  LSP Integration (Default) - Language Server Protocol for 7% more accurate symbol extraction (Python, C#, TypeScript, JavaScript)
  • ๐Ÿ”ฅ Local Embeddings (Default) - sentence-transformers included (fast, no setup) or Ollama (optional, GPU support)
  • ๐ŸŒณ AST-Based Fallback - tree-sitter for unsupported languages or when LSP unavailable (11 languages)
  • ๐Ÿ—„๏ธ Symbol Tables - SQLite-based symbol storage with FTS5 full-text search
  • ๐Ÿ“ˆ Dependency Graphs - NetworkX-based graph analysis and traversal
  • ๐ŸŒ Multi-Language Support - TypeScript, JavaScript, Python, Java, C#, Go, Rust, Ruby, PHP, C, C++
  • โšก High Performance - <100ms queries, <4s per file indexing (LSP), <1GB memory
  • ๐Ÿ”’ Security Built-in - Path validation, rate limiting, and resource limits
  • ๐Ÿค– MCP Integration - Works with Claude Desktop and other MCP clients
  • ๐Ÿ“Š Coverage Modes - Focused (5), Balanced (10), Comprehensive (20), Exhaustive (50) results

๐Ÿš€ Quick Start

Get started in under 2 minutes with uvx - zero installation, zero configuration required!

Prerequisites

That's it! No Ollama, no language servers, no additional setup needed. Everything is included.

1. Configure Claude Desktop

Add to your Claude Desktop MCP configuration (%APPDATA%\Claude\claude_desktop_config.json on Windows or ~/Library/Application Support/Claude/claude_desktop_config.json on Mac):

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"]
    }
  }
}

That's it! This uses the default configuration:

  • โœ… Language-aware analysis - Automatic language detection and specialized routing
  • โœ… LSP integration - Accurate symbol extraction (Python, C#, TypeScript, JavaScript)
  • โœ… sentence-transformers - Fast local embeddings (no Ollama needed)
  • โœ… Test code filtering - Excludes test files from search results
  • โœ… Git filtering - Smart handling of untracked files
  • โœ… All enhancement features - Symbol tables, dependency graphs, hybrid search

Note: We specify --python 3.12 because some dependencies don't yet support Python 3.13. If you only have Python 3.13, install Python 3.12 with brew install python@3.12 (Mac) or download from python.org (Windows).

2. Restart Claude Desktop

That's it! SemanticScout will be automatically downloaded and run when Claude needs it.

โœจ Benefits:

  • โœ… No manual installation
  • โœ… No Ollama or language server setup required
  • โœ… Always uses latest version
  • โœ… Automatic dependency management
  • โœ… Isolated environment per run
  • โœ… Works on Windows, Mac, and Linux
  • โœ… Data stored in ~/.semanticscout/

Optional: Custom Data Directory

By default, data is stored in ~/.semanticscout/. To use a custom location:

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": [
        "--python", "3.12",
        "semanticscout@latest",
        "--data-dir", "/path/to/your/data"
      ]
    }
  }
}

๐Ÿ”„ Incremental Indexing & Git Integration

SemanticScout v2.7.0 provides advanced Git integration with enhanced filtering and 5-10x faster updates.

Enhanced Git Features

Smart File Filtering:

  • Untracked file detection: Automatically excludes untracked files from indexing
  • Git status caching: 30-second cache for performance optimization
  • Configurable filtering: Enable/disable untracked file filtering
  • Massive project support: Handles large repositories efficiently

Automatic Change Detection:

  • Git repositories: Uses git diff to detect changed files since last index
  • Non-Git directories: Uses MD5 file hashing to detect changes
  • Chunk-level granularity: Only re-embeds changed code chunks (not entire files)

Usage:

# Full indexing (indexes all files)
index_codebase(path="/path/to/project")

# Incremental indexing (only indexes changed files - 5-10x faster!)
index_codebase(path="/path/to/project", incremental=True)

Performance:

  • Small changes (1-10% of files): 5-10x faster
  • Chunk-level reuse: 50%+ fewer embeddings generated
  • Parallel processing: 4x+ speedup with multiple files

When to use:

  • โœ… Incremental: After initial indexing, for regular code updates
  • โœ… Full: First-time indexing, major refactors, model changes

Real-Time File Change Events

Process file changes from editors in real-time:

# Process file change events
process_file_changes(
    collection_name="my_project",
    changes=json.dumps({
        "events": [
            {"type": "modified", "path": "src/main.py", "timestamp": 1234567890}
        ],
        "workspace_root": "/path/to/project",
        "debounce_ms": 500
    }),
    auto_update=True  # Apply changes immediately
)

Security: All file paths are validated to prevent path traversal attacks.


๐ŸŽฏ Language-Aware Analysis Configuration

SemanticScout v2.7.0 provides language-aware dependency analysis with 9.3x better accuracy than generic analysis.

How It Works

Language Detection & Routing:

  • Automatic Detection: Analyzes project structure, config files, and file extensions
  • Specialized Strategies: Routes to language-specific dependency analysis
  • Rust Support: Cargo.toml parsing, mod declarations, crate resolution
  • C# Support: Namespace resolution, using statements, project references
  • Python Support: Import analysis, package detection, module resolution

Performance Comparison:

Language Generic Analysis Language-Aware Improvement
Rust 8% accuracy 100% accuracy 12.5x better
C# 12% accuracy 100% accuracy 8.3x better
Python 15% accuracy 100% accuracy 6.7x better

๐Ÿง  LSP Integration Configuration

SemanticScout uses Language Server Protocol (LSP) by default for more accurate symbol extraction.

LSP vs Tree-sitter:

  • LSP (default): Uses real language servers (jedi, omnisharp, tsserver) for symbol extraction
    • โœ… 7% more symbols extracted (2,722 vs 2,542)
    • โœ… More accurate type information and signatures
    • โœ… Better handling of complex language features
    • โš ๏ธ 2x slower indexing (3.88s vs 1.85s per file)
  • Tree-sitter (fallback): Fast AST-based parsing
    • โœ… Very fast indexing
    • โœ… Works for all languages
    • โš ๏ธ Less accurate symbol extraction

Supported Languages

Language LSP Server Dependency Analysis Status
Python jedi โœ… Specialized โœ… Full support
C# omnisharp โœ… Specialized โœ… Full support
TypeScript tsserver โœ… Specialized โœ… Full support
JavaScript tsserver โœ… Specialized โœ… Full support
Rust tree-sitter โœ… Specialized โœ… Full support
Go, Java, etc. tree-sitter โš ๏ธ Generic โœ… Basic support

Disabling LSP (Use Tree-sitter Only)

If you prefer faster indexing over accuracy, you can disable LSP:

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "SEMANTICSCOUT_CONFIG_JSON": "{\"enhancement_config\":{\"lsp_integration\":{\"enabled\":false}}}"
      }
    }
  }
}

Per-Language Configuration

Disable LSP for specific languages:

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "SEMANTICSCOUT_CONFIG_JSON": "{\"enhancement_config\":{\"lsp_integration\":{\"languages\":{\"python\":{\"enabled\":false}}}}}"
      }
    }
  }
}

Note: LSP servers are automatically installed via the multilspy package (included in dependencies).


โšก Advanced Configuration

Default Configuration (Recommended)

No configuration needed! The default setup uses:

  • Language-aware analysis - Automatic language detection and specialized routing
  • LSP integration - Accurate symbol extraction (Python, C#, TypeScript, JavaScript)
  • sentence-transformers - Fast local embeddings (30-60 sec for 500 chunks)
  • Test code filtering - Excludes test files from search results
  • Git filtering - Smart handling of untracked files
  • All enhancement features - Symbol tables, dependency graphs, hybrid search
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"]
    }
  }
}

Embedding Provider Options

SemanticScout supports multiple embedding providers:

Provider Speed Setup Required Use Case
sentence-transformers (default) ~30-60 sec for 500 chunks โœ… None Best for most users
Ollama (async) ~2.6-4.4 min for 500 chunks Ollama server GPU acceleration, larger models
Ollama (sequential) ~26-44 min for 500 chunks Ollama server Legacy/testing

Option 1: sentence-transformers (Default - Recommended)

Already configured! This is the default. Available models:

  • all-MiniLM-L6-v2 - 384 dims, very fast, good quality (default)
  • all-mpnet-base-v2 - 768 dims, higher quality, slower
  • paraphrase-MiniLM-L6-v2 - 384 dims, optimized for paraphrase

To use a different model:

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"sentence-transformers\",\"model\":\"all-mpnet-base-v2\"}}"
      }
    }
  }
}

Option 2: Ollama (Optional - For GPU Acceleration)

Requires Ollama server running locally:

# Start Ollama and pull model
ollama serve
ollama pull nomic-embed-text
{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "OLLAMA_BASE_URL": "http://localhost:11434",
        "OLLAMA_MODEL": "nomic-embed-text",
        "OLLAMA_MAX_CONCURRENT": "10",
        "SEMANTICSCOUT_CONFIG_JSON": "{\"embedding\":{\"provider\":\"ollama\"}}"
      }
    }
  }
}

๐Ÿ“– Usage

Once configured in Claude Desktop, you can use natural language to interact with the MCP server:

Example Conversations

Index a codebase:

You: "Index my codebase at /workspace"
Claude: [Calls index_codebase tool and shows indexing progress]

Search for code:

You: "Find the authentication logic"
Claude: [Calls search_code tool and shows relevant code snippets]

List indexed projects:

You: "What codebases have been indexed?"
Claude: [Calls list_collections tool and shows all indexed projects]

Clear an index:

You: "Delete the index for my old project"
Claude: [Calls clear_index tool after confirmation]

Available MCP Tools

The server exposes these tools to Claude (you don't call them directly):

Core Tools

Tool Description Parameters
index_codebase Index a codebase with language-aware analysis path (required), incremental (optional)
search_code Search with natural language + context expansion query, collection_name, coverage_mode, exclude_test_files
list_collections List all indexed codebases None
get_indexing_status Get statistics for a collection collection_name
clear_index Delete a collection (permanent) collection_name

Enhanced Tools (Symbol & Dependency Analysis)

Tool Description Parameters
find_symbol Find symbols with language-aware lookup symbol_name, collection_name, symbol_type
find_callers Find all functions that call a given symbol symbol_name, collection_name, max_results
trace_dependencies Trace dependency chains with language-specific analysis file_path, collection_name, depth
process_file_changes Process real-time file change events collection_name, changes, auto_update

โš™๏ธ Environment Variables

Most users don't need to configure anything! The defaults work great.

Optional Environment Variables

Variable Default Description
MAX_FILE_SIZE_MB 10.0 Skip files larger than this
MAX_CODEBASE_SIZE_GB 10.0 Maximum total codebase size
MAX_FILES 100000 Maximum number of files
CHUNK_SIZE_MIN 500 Minimum chunk size (chars)
CHUNK_SIZE_MAX 1500 Maximum chunk size (chars)
LOG_LEVEL INFO Logging level

Ollama-Specific Variables (Only if using Ollama)

Variable Default Description
OLLAMA_BASE_URL http://localhost:11434 Ollama server URL
OLLAMA_MODEL nomic-embed-text Embedding model to use
OLLAMA_MAX_CONCURRENT 10 Max concurrent requests

Example with Custom Settings

{
  "mcpServers": {
    "semanticscout": {
      "command": "uvx",
      "args": ["--python", "3.12", "semanticscout@latest"],
      "env": {
        "MAX_FILE_SIZE_MB": "20.0",
        "LOG_LEVEL": "DEBUG"
      }
    }
  }
}

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   MCP Client    โ”‚  (Claude Desktop, etc.)
โ”‚  (AI Agent)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚ JSON-RPC over STDIO
         โ”‚
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   MCP Server    โ”‚
โ”‚  (FastMCP)      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚         โ”‚        โ”‚          โ”‚          โ”‚
โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ” โ”Œโ”€โ”€โ–ผโ”€โ”€โ” โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”
โ”‚Indexerโ”‚ โ”‚Queryโ”‚ โ”‚Hybrid  โ”‚ โ”‚Vector โ”‚ โ”‚Symbol/ โ”‚
โ”‚       โ”‚ โ”‚Anal โ”‚ โ”‚Retriev โ”‚ โ”‚ Store โ”‚ โ”‚DepGraphโ”‚
โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”ฌโ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”˜
    โ”‚        โ”‚        โ”‚          โ”‚         โ”‚
โ”Œโ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”
โ”‚    ChromaDB + SQLite + NetworkX + Caches     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Core Components

  • File Discovery: Finds code files, respects .gitignore
  • LSP Processor: Uses Language Server Protocol for accurate symbol extraction (Python, C#, TypeScript, JavaScript)
  • AST Processor: Parses code with tree-sitter, extracts symbols and dependencies (fallback or unsupported languages)
  • Code Chunker: AST-based semantic chunking
  • Embedding Provider: Generates vector embeddings (Ollama or sentence-transformers)
  • Vector Store: Stores and searches embeddings (ChromaDB)
  • Symbol Table: SQLite-based symbol storage with FTS5 search
  • Dependency Graph: NetworkX-based graph analysis
  • Query Analyzer: Classifies queries and routes to optimal strategy
  • Hybrid Retriever: Coordinates semantic, symbol, and dependency search
  • Context Expander: Intelligent context expansion with dependency awareness
  • Security Validators: Path validation, rate limiting, input sanitization

๐Ÿงช Development

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov

# Run specific test file
pytest tests/unit/test_semantic_search.py -v

Test Coverage

Current coverage: 85% (400+ tests passing)

Core Components:

  • File Discovery: 85%
  • Code Chunker: 89%
  • Ollama Provider: 92%
  • Vector Store: 89%
  • Query Processor: 100%
  • Semantic Search: 99%
  • Security Validators: 95%

Enhanced Components:

  • Language Detection: 90%
  • Dependency Router: 88%
  • AST Processor: 82%
  • Symbol Table: 79%
  • Dependency Graph: 84%
  • Query Analyzer: 100%
  • Hybrid Retriever: 97%
  • Context Expander: 82%
  • Git Integration: 85%
  • Test Filtering: 92%

Project Structure

semanticscout/
โ”œโ”€โ”€ src/semanticscout/
โ”‚   โ”œโ”€โ”€ mcp_server.py              # MCP server entry point
โ”‚   โ”œโ”€โ”€ config/                    # Configuration management
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ””โ”€โ”€ enhancement_config.py
โ”‚   โ”œโ”€โ”€ logging_config.py          # Logging setup
โ”‚   โ”œโ”€โ”€ indexer/                   # Indexing components
โ”‚   โ”‚   โ”œโ”€โ”€ file_discovery.py
โ”‚   โ”‚   โ”œโ”€โ”€ file_classifier.py     # NEW: Test file detection
โ”‚   โ”‚   โ”œโ”€โ”€ code_chunker.py
โ”‚   โ”‚   โ”œโ”€โ”€ git_change_detector.py # NEW: Enhanced git filtering
โ”‚   โ”‚   โ””โ”€โ”€ pipeline.py
โ”‚   โ”œโ”€โ”€ language_detection/        # NEW: Language detection (v2.7.0)
โ”‚   โ”‚   โ””โ”€โ”€ project_language_detector.py
โ”‚   โ”œโ”€โ”€ dependency_analysis/       # NEW: Language-aware analysis (v2.7.0)
โ”‚   โ”‚   โ”œโ”€โ”€ dependency_router.py
โ”‚   โ”‚   โ””โ”€โ”€ strategies.py
โ”‚   โ”œโ”€โ”€ lsp/                       # LSP integration (v2.4.0)
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ language_server_manager.py
โ”‚   โ”‚   โ”œโ”€โ”€ lsp_processor.py
โ”‚   โ”‚   โ””โ”€โ”€ lsp_symbol_mapper.py
โ”‚   โ”œโ”€โ”€ ast_processing/            # AST parsing & symbol extraction (fallback)
โ”‚   โ”‚   โ”œโ”€โ”€ ast_processor.py
โ”‚   โ”‚   โ””โ”€โ”€ ast_cache.py
โ”‚   โ”œโ”€โ”€ symbol_table/              # Symbol storage & lookup
โ”‚   โ”‚   โ””โ”€โ”€ symbol_table.py
โ”‚   โ”œโ”€โ”€ dependency_graph/          # Dependency tracking
โ”‚   โ”‚   โ””โ”€โ”€ dependency_graph.py
โ”‚   โ”œโ”€โ”€ query_analysis/            # Query classification
โ”‚   โ”‚   โ””โ”€โ”€ query_analyzer.py
โ”‚   โ”œโ”€โ”€ embeddings/                # Embedding providers
โ”‚   โ”‚   โ”œโ”€โ”€ base.py
โ”‚   โ”‚   โ””โ”€โ”€ ollama_provider.py
โ”‚   โ”œโ”€โ”€ vector_store/              # Vector database
โ”‚   โ”‚   โ””โ”€โ”€ chroma_store.py
โ”‚   โ”œโ”€โ”€ retriever/                 # Search components
โ”‚   โ”‚   โ”œโ”€โ”€ query_processor.py
โ”‚   โ”‚   โ”œโ”€โ”€ semantic_search.py     # Enhanced with test filtering
โ”‚   โ”‚   โ”œโ”€โ”€ hybrid_retriever.py    # Enhanced with deduplication
โ”‚   โ”‚   โ””โ”€โ”€ context_expander.py    # Enhanced with smart expansion
โ”‚   โ”œโ”€โ”€ performance/               # Performance monitoring
โ”‚   โ”‚   โ”œโ”€โ”€ metrics.py
โ”‚   โ”‚   โ”œโ”€โ”€ memory.py
โ”‚   โ”‚   โ””โ”€โ”€ parallel.py
โ”‚   โ””โ”€โ”€ security/                  # Security & validation
โ”‚       โ””โ”€โ”€ validators.py
โ”œโ”€โ”€ tests/                         # Unit & integration tests
โ”‚   โ”œโ”€โ”€ unit/                      # Unit tests (200+ tests)
โ”‚   โ”œโ”€โ”€ integration/               # Integration tests
โ”‚   โ””โ”€โ”€ validation/                # Validation tests
โ”œโ”€โ”€ examples/                      # Example scripts
โ”œโ”€โ”€ docs/                          # Documentation
โ”‚   โ”œโ”€โ”€ API_REFERENCE.md
โ”‚   โ”œโ”€โ”€ USER_GUIDE.md
โ”‚   โ”œโ”€โ”€ CONFIGURATION.md
โ”‚   โ””โ”€โ”€ PERFORMANCE_TUNING.md
โ””โ”€โ”€ config/                        # Configuration files
    โ””โ”€โ”€ enhancement_config.template.json

## ๐Ÿ“ Runtime Data Structure

SemanticScout stores all runtime data in `~/semanticscout/`:

~/semanticscout/ # User's home directory โ”œโ”€โ”€ config/ # Configuration files โ”‚ โ””โ”€โ”€ enhancement_config.json โ”œโ”€โ”€ data/ # Runtime data โ”‚ โ”œโ”€โ”€ chroma_db/ # Vector store database โ”‚ โ”œโ”€โ”€ symbol_tables/ # Symbol databases โ”‚ โ”œโ”€โ”€ dependency_graphs/ # Dependency graph files โ”‚ โ””โ”€โ”€ ast_cache/ # AST parsing cache โ””โ”€โ”€ logs/ # Log files โ””โ”€โ”€ mcp_server.log


## ๐Ÿ“š Documentation

Comprehensive documentation is available in the `docs/` directory:

- **[API_REFERENCE.md](docs/API_REFERENCE.md)** - Complete API documentation for all MCP tools
- **[USER_GUIDE.md](docs/USER_GUIDE.md)** - User guide with examples and best practices
- **[CONFIGURATION.md](docs/CONFIGURATION.md)** - Configuration options and feature flags
- **[PERFORMANCE_TUNING.md](docs/PERFORMANCE_TUNING.md)** - Performance optimization guide

### Examples

See the [examples/](examples/) directory for working examples:

- `test_full_pipeline.py` - Complete indexing and search workflow
- `test_retrieval_system.py` - Advanced search with filtering
- `index_weather_unified.py` - Real-world codebase indexing

## ๐Ÿ› Troubleshooting

### Python Version Issues

**Error:** `No module named 'onnxruntime'` or tree-sitter compatibility issues

**Solution:** Use Python 3.12 (not 3.14). See [PYTHON_VERSION_ISSUE.md](PYTHON_VERSION_ISSUE.md).

### Ollama Not Running (Only if using Ollama)

**Error:** `Ollama server not available`

**Solution:** The default configuration uses sentence-transformers (no Ollama needed). If you explicitly configured Ollama, start it:
```bash
ollama serve
ollama pull nomic-embed-text

Or switch back to the default (sentence-transformers) by removing Ollama configuration.

Rate Limit Exceeded

Error: Rate limit exceeded: Maximum X requests per hour

Solution: Adjust rate limits in .env:

MAX_INDEXING_REQUESTS_PER_HOUR=20
MAX_SEARCH_REQUESTS_PER_MINUTE=200

Path Not Allowed

Error: Path is not within allowed directories

Solution: The server only allows indexing within the current working directory by default.

๐Ÿค Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE for details.

๐Ÿ™ Acknowledgments


Built with โค๏ธ for the AI agent ecosystem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticscout-2.8.0.tar.gz (182.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semanticscout-2.8.0-py3-none-any.whl (168.2 kB view details)

Uploaded Python 3

File details

Details for the file semanticscout-2.8.0.tar.gz.

File metadata

  • Download URL: semanticscout-2.8.0.tar.gz
  • Upload date:
  • Size: 182.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for semanticscout-2.8.0.tar.gz
Algorithm Hash digest
SHA256 b62812d48e4669910ae1451c1c180647c603e32e4a6ffab790935f8b2df66d31
MD5 64b65cd2ecb8a6ae51d1a6cbf3fd712f
BLAKE2b-256 db766d46e667e2fa92ec2968056d2819b8df66f232059b3ae32ac5ad1949abd8

See more details on using hashes here.

File details

Details for the file semanticscout-2.8.0-py3-none-any.whl.

File metadata

  • Download URL: semanticscout-2.8.0-py3-none-any.whl
  • Upload date:
  • Size: 168.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for semanticscout-2.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 82ac521eca3d3b433a730b00ca1cbf447801a4855d5d44d14c50d2c611707461
MD5 e3f5e0d34d32a2fa1bf3a96e66e37371
BLAKE2b-256 80236f365e09b2a9d238f42d40b8a38a214ba0e103be6bcd6d2d896e1db0b476

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page