Skip to main content

MCP Web Search service for AI ecosystem

Project description

MCP Web Search

Python License: MIT Build Status ruff mypy

MCP service for web search and content extraction, implemented via Model Context Protocol (FastMCP).

Features

Three MCP tools:

  1. search — web search with smart filtering and fallback chain
  2. content — clean text extraction from URLs with SSRF protection
  3. webfetch — agent-based search via LangGraph StateGraph + LLM-as-Judge
  4. llm_health — LLM model health status in failover chain

Architecture

FastMCP (primary server)
├── search tool    → DuckDuckGo + fallback chain + smart filtering
├── content tool   → Trafilatura + SSRF protection + cache
└── webfetch tool  → LangGraph StateGraph (8 nodes) + LLM-as-Judge

Installation

# Clone the repository
git clone https://github.com/M0M0S/mcp-webs.git
cd mcp-webs

# Install dependencies
uv sync

# Configure environment variables
cp .env.example .env
# fill .env (LLM_API_KEY, LLM_BASE_URL, etc.)

Usage

Start MCP Server

uv run python -m app.main

Connect to Claude Desktop (example)

Add to claude_desktop_config.json:

{
  "mcpServers": {
    "web-search": {
      "command": "uv",
      "args": ["run", "python", "-m", "app.main"],
      "env": {
        "LLM_API_KEY": "your-key",
        "LLM_BASE_URL": "https://api.openai.com/v1"
      }
    }
  }
}

MCP Tools

Tool Description Parameters
search Web search with fallback chain query, max_results, provider
content Extract text content from URL url, token_limit
webfetch Agent-based search via LangGraph query, max_concurrent

Development

Project Standards

Commands

# Tests
uv run pytest tests/ -v

# Coverage
uv run pytest tests/ --cov=app --cov-report=term-missing

# Linting
uv run ruff check app/ tests/

# Formatting
uv run ruff format app/ tests/

# Type checking
uv run mypy app/

# Security scan
uv run bandit -r app/

Configuration

Environment variables documented in docs/standards/configuration.md.

Search Logic

search — search with fallback chain:

  1. Caching (Redis cache-aside)
  2. DuckDuckGo → SearxNG → Tavily → Google (fallback chain)
  3. Smart filtering (SEO spam, clickbait, blacklist)
  4. Result caching

content — content extraction:

  1. SSRF protection (whitelist + private IP check)
  2. Trafilatura → readability-lxml → bs4 (fallback chain)
  3. HTML sanitization (bleach)
  4. Caching (TTL: 24h)

webfetch — agent-based search:

  1. Stage 1: Generate queries via LLM
  2. Stage 2: Parallel searches (6 concurrent)
  3. Stage 3: Select URLs for extraction
  4. Stage 4: Judge URLs (LLM-as-Judge, threshold ≥0.85)
  5. Stage 5: Fetch content (Trafilatura)
  6. Stage 6: Generate features (Pydantic models)
  7. Stage 7: Judge Features (threshold ≥0.92)
  8. Fallback: Simple search on agent failure

Prometheus Metrics

Implemented metrics (via app/core/metrics.py):

Metric Type Description
provider_search_total Counter Search attempts per provider
provider_search_failure_total Counter Failed searches per provider
provider_health_score Gauge Provider health (0.0–1.0)
provider_chain_position Gauge Provider position in fallback chain
llm_failover_total Counter LLM failover events (from→to model)
llm_failover_duration_seconds Histogram Failover duration
llm_model_health_score Gauge LLM model health (0.0–1.0)
llm_active_model_index Gauge Active LLM model index
webfetch_checkpoint_save_total Counter WebFetch checkpoint saves
webfetch_checkpoint_resume_total Counter WebFetch checkpoint resumes
webfetch_checkpoint_size_bytes Histogram Checkpoint payload size
webfetch_active_checkpoints Gauge Active checkpoints per tenant
cache_ttl_distribution_seconds Histogram Cache TTL distribution
cache_stale_invalidations_total Counter Cache stale invalidations
cache_freshness_avg Gauge Average cache freshness
knowledge_graph_concepts_count Gauge KG concepts count
knowledge_graph_terms_count Gauge KG related terms count
kg_expansion_applied_total Counter KG expansion events
kg_enriched_concepts_total Counter KG enriched concepts

See Also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_webs-1.0.3.tar.gz (334.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_webs-1.0.3-py3-none-any.whl (70.4 kB view details)

Uploaded Python 3

File details

Details for the file mcp_webs-1.0.3.tar.gz.

File metadata

  • Download URL: mcp_webs-1.0.3.tar.gz
  • Upload date:
  • Size: 334.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_webs-1.0.3.tar.gz
Algorithm Hash digest
SHA256 2756fca0306f279b9b0c40caf7a94a8e2998c2394487d5ae443664f042a8dcb0
MD5 587560d144efac1af3e9cc5c1cdb6c06
BLAKE2b-256 42ecf4177e288c44a13dab03f091b586a67f21a7046a17c2f62d904749c0a50c

See more details on using hashes here.

File details

Details for the file mcp_webs-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: mcp_webs-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 70.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for mcp_webs-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fccf5a699a5714185bc9d024a964dcfecda6ccbdeec1d329d1635308e82d3157
MD5 7ff411d4ca5193a06de8c9da80bc6081
BLAKE2b-256 923725558935be1d6d84cacb446a0db69d392c95dfd1e610185cb5a1aabc2d15

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page