Skip to main content

MCP server providing web search (DuckDuckGo), HTTP fetch, browser fetch (Playwright), and file download.

Project description

www-search-mcp

MCP (Model Context Protocol) server providing web search, HTTP fetch, browser-based fetch (Playwright), file download, and package search (PyPI, GitHub).

Gives AI assistants (Claude Desktop, Cursor, etc.) the ability to search the web, read web pages, download files, and discover Python packages.

This server is optimized for batching: tools accept either a single string or a list. Agents should prefer passing lists (multi-query / multi-URL) to reduce round-trips.

Project Structure

High-level layout:

  • src/www_search_mcp/server.py: MCP entry point (FastMCP creation, tool registration, shutdown hooks)
  • src/www_search_mcp/info.py: single source of truth for tool docs + web_mcp_info
  • src/www_search_mcp/tools/: one file per MCP tool implementation
  • src/www_search_mcp/search.py: DuckDuckGo search + GitHub/PyPI API logic
  • src/www_search_mcp/utils/: shared reusable building blocks (no MCP wiring)

utils/ modules (shared code):

  • utils/http_async.py: shared niquests.AsyncSession, DNS resolver preset, global outbound request semaphore (WEB_REQUEST_LIMIT)
  • utils/fetch_processing.py: response -> standardized payload helpers (fetch markdown, save-to-file, streamed download limits)
  • utils/paths.py: URL/path validation and small normalizations (validate_url, resolve_path, normalize_fetch_div)
  • utils/search_exec.py: common search execution logic (execute_search, FIELD_MAPS, clamping)
  • utils/cache.py: TTL/LRU caches (sync + async-safe)
  • utils/throttle.py: global throttling (sync + async)
  • utils/html.py: HTML -> Markdown conversion helpers

Note: tools typically import from the www_search_mcp.utils facade, which re-exports the stable public helpers from these modules.

System Requirements

Requirement Version
uv Install guide
Python 3.10+ (managed automatically by uv)
Playwright Chromium browser (installed via post-install script)

Install uv

macOS / Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

macOS (Homebrew):

brew install uv

Installation

Option 1: Run directly with uvx (recommended)

No clone needed. Runs from PyPI. Updates automatically on each run:

uvx www-search-mcp

MCP client config:

{
  "mcpServers": {
    "www-search-mcp": {
      "command": "uvx",
      "args": ["www-search-mcp"]
    }
  }
}

Option 2: Install as a uv tool

# From PyPI
uv tool install www-search-mcp

# After installation, the command is available globally:
www-search-mcp

To update or reinstall:

uv tool upgrade www-search-mcp
# or force reinstall latest:
uv tool install --force www-search-mcp@latest

MCP client config (global tool):

{
  "mcpServers": {
    "www-search-mcp": {
      "command": "www-search-mcp"
    }
  }
}

Option 3: Run from local source (for development)

git clone https://github.com/naifs/www-search-mcp.git
cd www-search-mcp
uv sync
uv run www-search-mcp

To update:

git pull && uv sync

MCP client config (local source):

{
  "mcpServers": {
    "www-search-mcp": {
      "command": "uv",
      "args": [
        "run",
        "--project",
        "/absolute/path/to/www-search-mcp",
        "www-search-mcp"
      ]
    }
  }
}

Option 4: Install from built wheel

cd /path/to/www-search-mcp
uv build
uv tool install dist/*.whl

To update:

uv build && uv tool install --force dist/*.whl

Note: Playwright Chromium is installed automatically on first use when a browser tool is called. If auto-install fails (e.g. no network), run manually:

uv run python -m playwright install chromium

Environment Variables

Variable Description Default
WEB_TIMEOUT Legacy total request timeout in seconds 30
WEB_TIMEOUT_TOTAL Total request timeout in seconds (fallback: WEB_TIMEOUT) 30
WEB_TIMEOUT_CONNECT Connect timeout in seconds 5
WEB_TIMEOUT_READ Read timeout in seconds 25
WEB_MAX_RESULTS Default max search results per query (1..25) 5
WEB_MAX_FETCH_CHARS Max characters returned in fetch body 200000
WEB_RETRIES Retry attempts on timeout/rate-limit (0..5) 2
WEB_MIN_INTERVAL Minimum seconds between outbound requests (throttle) 1.0
WEB_MAX_DOWNLOAD_MB Max size per downloaded file, in MB (1 MB = 1024·1024 B) 50
WEB_DEBUG Enable debug logging (1/true/yes/on) false
WEB_SESSION_ENABLED Enable persistent cookies/session by default false
WEB_TRANSPORT MCP transport: stdio|sse|streamable-http stdio
WEB_HTTP_HOST Host for HTTP-based transports 127.0.0.1
WEB_HTTP_PORT Port for HTTP-based transports 8000
WEB_PROXY HTTP proxy URL (e.g. http://proxy:8080)
WEB_REQUEST_LIMIT Max concurrent outbound requests (global limit) 50
WEB_DNS_RESOLVER DNS preset: google|cloudflare|yandex|quad9|system. Supports DoH, DoT, DoQ (Quad9 only), and plain DNS formats in the fallback chain google
WEB_DNS_STRATEGY IPv4/IPv6 strategy: only_ipv4|only_ipv6 (default: both enabled)
WEB_SSL_VERIFY Verify TLS certificates (system trust store via truststore) true
WEB_SSL_PATH Optional path to extra CA certs (PEM file or OpenSSL-hashed dir), loaded in addition to the system trust store
WEB_USER_AGENT Custom User-Agent string for HTTP requests Chrome 135 UA
WEB_UA_ROTATION Enable User-Agent rotation across a pool of realistic desktop UAs false
WEB_UA_LIST Custom User-Agent list separated by ||| (optional)
WEB_GITHUB_TOKEN Optional GitHub token to increase API rate limits
WEB_REQUEST_TOKEN Optional default Authorization token for web_request (used if headers.Authorization is not set). If token has no scheme, Bearer is assumed.

Notes:

  • Default values are resolved from src/www_search_mcp/config.py.
  • WEB_TIMEOUT is a legacy alias for total timeout; prefer WEB_TIMEOUT_TOTAL.
  • WEB_DNS_RESOLVER=system disables DoH/DoT and uses the OS resolver (useful for corporate/filtered networks).
  • WEB_DNS_STRATEGY=only_ipv4 or only_ipv6 forces single-stack; omit or use empty value for dual-stack (default).

Provided Tools

Search Tools

  • web_search (general web pages)

    • Input:
      • queries: str | list[str]
      • max_results: int (typical 5..25)
    • Output:
      • status, query, result_count
      • results[] with title, url, snippet
    • Note: safe search is always disabled
  • web_search_images (image URLs)

    • Input:
      • queries: str | list[str]
      • max_results: int (typical 5..25)
    • Output:
      • status, query, result_count
      • results[] with title, image, url, thumbnail, height, width, source
    • Note: safe search is always disabled
  • web_search_github (GitHub repos via API)

    • Input:
      • queries: str | list[str]
      • max_results: int (typical 5..25)
    • Output:
      • status, query, result_count
      • results[] with title, url, description, stars, forks, language
    • Note: uses GitHub REST API (no token needed, but rate-limited to ~10 req/min)
  • web_search_pypi (PyPI packages)

    • Input:
      • queries: str | list[str]
      • max_results: int (typical 5..25)
    • Output:
      • status, query, result_count
      • results[] with name, version, summary, author, license, requires_python, url, repository, py_versions, dependencies
    • Note: uses DuckDuckGo discovery + PyPI JSON API for enriched metadata

Fetch & Download Tools

  • web_fetch (read known URL(s) as markdown)

    • Input:
      • urls: str | list[str] (http/https only)
      • fetch_div: str = "" — optional CSS selector (e.g. article, .post-body)
      • save_file: str = "" — optional absolute file path with extension
      • use_session: bool = False — reuse cookies from previous requests
    • Output:
      • status, http_status, url, truncated, title?, content_type?, body (or saved_to/bytes_written when save_file is used)
  • web_fetch_browser (browser-rendered fetch for JS/login/captcha)

    • Input:
      • urls: str | list[str] (http/https only)
      • fetch_div: str = "" — optional CSS selector
      • save_file: str = "" — optional absolute file path with extension
      • headless: bool = True — show browser window or run hidden
      • wait_seconds: int = 0 — extra wait after page load
      • use_session: bool = False — reuse browser cookies/context
    • Output: same as web_fetch plus title
    • Use for JS-heavy pages or sites that block plain HTTP clients
  • web_download (download bytes to disk)

    • Input:
      • urls: str | list[str] (http/https only)
      • save_files: str | list[str] — required file path(s) with extension
      • use_session: bool = False — reuse cookies from previous requests
    • Output:
      • status, url, saved_to, bytes, content_type

API Tool

  • web_request (call REST/GraphQL APIs, optionally load-test)
    • Input: queries: dict | list[dict] (one spec or a batch)
    • Shared spec fields:
      • type: 'rest' | 'graphql'
      • method: 'GET'|'POST'|'PUT'|'PATCH'|'DELETE'|'HEAD'|'OPTIONS'
      • url: str (http/https)
      • headers: dict = {} (optional)
      • requests: int = 1
      • concurrency: int = 1 (async workers, not OS threads)
      • time: float = 0
        • 0: send exactly requests * concurrency requests as fast as possible
        • >0: best-effort pacing at ~requests * concurrency requests/sec for time seconds
    • REST-specific:
      • body: dict|list|str|None
        • dict/list: sent as JSON
        • str: sent as raw body
    • GraphQL-specific:
      • either query: str (recommended) OR body: str|dict
      • optional variables: dict
      • optional operationName: str (only needed when the GraphQL document contains multiple operations)
    • Auth:
      • If headers.Authorization is not provided, WEB_REQUEST_TOKEN is used as default.
    • Output:
      • aggregated stats: status_counts, http_status_counts, latency_ms percentiles
      • for small runs (total_requests <= 3): includes response_samples (truncated)

Quick Verification

# Search the web
uv run python -c "import asyncio; from www_search_mcp.tools.web_search import web_search; r=asyncio.run(web_search('python mcp protocol', max_results=3)); print(r['status'], r['result_count'])"

# Fetch a page
uv run python -c "import asyncio; from www_search_mcp.tools.web_fetch import web_fetch; r=asyncio.run(web_fetch('https://example.com')); print(r['status'], r['http_status'])"

# Search GitHub
uv run python -c "import asyncio; from www_search_mcp.tools.web_search_github import web_search_github; r=asyncio.run(web_search_github('fastapi', max_results=3)); print(r['status'], r['result_count'])"

# Search PyPI
uv run python -c "import asyncio; from www_search_mcp.tools.web_search_pypi import web_search_pypi; r=asyncio.run(web_search_pypi('httpx', max_results=3)); print(r['status'], r['result_count'])"

# Call a REST API
uv run python -c "import asyncio; from www_search_mcp.tools.web_request import web_request; r=asyncio.run(web_request({'type':'rest','method':'POST','url':'https://httpbin.org/post','body':{'hello':'world'},'requests':1,'concurrency':1,'time':0})); print(r['status'], r['http_status_counts'])"

# Call a GraphQL API
uv run python -c "import asyncio; from www_search_mcp.tools.web_request import web_request; r=asyncio.run(web_request({'type':'graphql','method':'POST','url':'https://countries.trevorblades.com/','query':'{ __typename }','requests':1,'concurrency':1,'time':0})); print(r['status'], r['http_status_counts'])"

Local Development Commands

All commands are intended to be run from the repository root and use uv run (no manual venv activation).

Install / Sync

uv sync --all-groups

Format + Lint

uv run ruff format src/ tests/
uv run ruff check src/ tests/

Type Check

uv run ty check src/

Tests (xdist)

pytest-xdist is enabled by default via pyproject.toml (-n auto).

uv run python -m pytest tests/ -q

# Explicit override:
uv run python -m pytest tests/ -q -n auto

Security Scans

uv run bandit -r src/
uv run pip-audit

Build + Install Wheel Locally

rm -rf dist/
uv build
uv tool install --force dist/*.whl

Run MCP Server Locally

uv run www-search-mcp

Troubleshooting

uv not found

Install uv and reopen your terminal. See System Requirements.

Dependencies missing

uv sync

Playwright browser not found

uv run python -m playwright install chromium

GitHub API rate limit exceeded

The GitHub API allows ~10 requests/minute without authentication. To increase the limit, set a GitHub token:

export WEB_GITHUB_TOKEN=ghp_your_token_here

Binary content error in web_fetch

web_fetch rejects binary content (images, PDFs, etc.). Use web_download instead to save binary files to disk.

MCP tools not appearing in client

  1. Check that the MCP client config JSON is valid.
  2. Ensure the --project path is absolute and correct.
  3. Reload the MCP client after config changes.
  4. Check WEB_DEBUG=true for detailed logs.

Wrong project path in config

The --project argument must point to the root directory of www-search-mcp (where pyproject.toml is located), not to the src/ subdirectory.

Session & Timeouts

Session persistence (use_session / WEB_SESSION_ENABLED)

When session persistence is enabled, cookies are stored per MCP session (not globally) using the FastMCP-injected request Context.

  • web_fetch / web_download: reuse a per-session niquests.AsyncSession (cookie jar + connection pool).
  • web_fetch_browser: reuse a per-session Playwright BrowserContext (browser cookies/storage).

If session persistence is disabled, tools use ephemeral clients/contexts.

Tool execution deadline

Tools are bounded by a tool-level deadline equal to WEB_TIMEOUT_TOTAL (fallback WEB_TIMEOUT). This deadline is independent of HTTP connect/read timeouts and prevents long-running tool calls.

HTTP Mode (Optional)

By default the server runs via stdio transport (best for desktop clients).

To run with HTTP-based transports, set WEB_TRANSPORT:

export WEB_TRANSPORT=streamable-http
export WEB_HTTP_HOST=127.0.0.1
export WEB_HTTP_PORT=8000
uv run www-search-mcp

Operational endpoints (HTTP transports only):

  • GET /healthz -> { "status": "ok" }
  • GET /readyz -> { "status": "ready", "version": "..." }

FastMCP endpoints (HTTP transports only):

  • streamable-http: POST /mcp
  • sse: GET /sse and POST /messages/

Note: FastMCP custom routes are not protected by server auth middleware by design; keep them non-sensitive.

MCP Resource

The server exposes a lightweight MCP resource:

  • www-search-mcp://server/info (JSON summary: name, version, tool count, safe env state)

For full tool docs and environment variable details, use the web_mcp_info tool.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

www_search_mcp-1.2.0.tar.gz (176.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

www_search_mcp-1.2.0-py3-none-any.whl (61.2 kB view details)

Uploaded Python 3

File details

Details for the file www_search_mcp-1.2.0.tar.gz.

File metadata

  • Download URL: www_search_mcp-1.2.0.tar.gz
  • Upload date:
  • Size: 176.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for www_search_mcp-1.2.0.tar.gz
Algorithm Hash digest
SHA256 3bd73e1790658a71b07f85462a6a68d4197817cf5ef2588d4c9ffd787fe6a3ed
MD5 8feea1975a120aad8f9234b42e0e9962
BLAKE2b-256 5acf4233c6ff937d11922c111fc8e61bd61d4d0eeb3cc4e3a323a77efa0360bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for www_search_mcp-1.2.0.tar.gz:

Publisher: ci-cd.yml on Naifs/www-search-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file www_search_mcp-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: www_search_mcp-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 61.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for www_search_mcp-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9a7ad7560bb8144000d206815e24a6fe317e06fcf1715a7575758a5ea843eb7b
MD5 fbd2cba87e356f0221ddd554d9300685
BLAKE2b-256 da2aabc1b437cfa06ffa765b4b3161fb26a9c24e48f7204ab457c8d9fbe68baf

See more details on using hashes here.

Provenance

The following attestation bundles were made for www_search_mcp-1.2.0-py3-none-any.whl:

Publisher: ci-cd.yml on Naifs/www-search-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page