Skip to main content

MCP server providing web search (DuckDuckGo), HTTP fetch, browser fetch (Playwright), and file download.

Project description

www-search-mcp

MCP server for web search, HTTP fetch, browser fetch (Playwright), file download, API requests, and package search (PyPI, GitHub).

Optimized for batching: every tool accepts lists (multi-query / multi-URL) to reduce round-trips.

Quick Install

# Run without installing (recommended)
uvx www-search-mcp

# Or install as a global tool
uv tool install www-search-mcp

VS Code / Cursor config:

{
  "mcpServers": {
    "www-search": {
      "command": "uvx",
      "args": ["www-search-mcp"]
    }
  }
}

Tools Overview

Tool What it does
web_search General web search (DuckDuckGo)
web_search_images Image search
web_search_github GitHub repo search
web_search_pypi PyPI package search
web_fetch Fetch URL as Markdown
web_fetch_browser Browser-rendered fetch (JS sites)
web_download Download files to disk
web_request REST/GraphQL API calls + load tests
web_mcp_info Server config and tool docs
web_mcp_status Real-time diagnostics

Batching: pass queries=["q1", "q2"] or urls=["url1", "url2"] instead of calling one-by-one.


Search Tools

web_search

queries: str | list[str]
max_results: int = 5

Returns title, url, snippet. Safe search disabled.

web_search_images

queries: str | list[str]
max_results: int = 5

Returns title, image, thumbnail, height, width, source.

web_search_github

queries: str | list[str]
max_results: int = 5

Returns title, url, stars, forks, language. Uses GitHub REST API (rate-limited to ~10/min without token).

web_search_pypi

queries: str | list[str]
max_results: int = 5

Returns name, version, summary, author, license, requires_python, dependencies.


Fetch & Download Tools

web_fetch

urls: str | list[str]
fetch_div: str = ""          # CSS selector (e.g. "article")
save_file: str = ""         # Absolute path to save
use_session: bool = False   # Reuse cookies

Returns Markdown body. Rejects binary content — use web_download for files.

web_fetch_browser

urls: str | list[str]
fetch_div: str = ""
save_file: str = ""
headless: bool = True
wait_seconds: int = 0
use_session: bool = False

Same output as web_fetch but renders JS. Use for SPAs, login walls, bot-blocked sites.

web_download

urls: str | list[str]
save_files: str | list[str]  # Required target path(s)
use_session: bool = False

Returns saved_to, bytes, content_type.


API Tool

web_request

queries: dict | list[dict]

Spec fields per query:

  • type: "rest" | "graphql"
  • method: "GET" | "POST" | ...
  • url: target URL
  • headers: optional dict
  • requests: repeat count (default 1)
  • concurrency: async workers (default 1)
  • time: duration in seconds (0 = fixed count)

REST: body: dict|list|str GraphQL: query: str, variables: dict, operationName: str

Auth: WEB_REQUEST_TOKEN used as default Authorization if not provided in headers.

Output: status_counts, http_status_counts, latency_ms percentiles. Small runs (<=3) include response samples.


Diagnostic Tools

web_mcp_info

Server configuration, tool descriptions, environment variables.

web_mcp_status

Real-time diagnostics:

  • uptime_seconds, pid, python_version
  • throttle: last request, interval
  • sessions: total, with browser, stale
  • connections: niquests version, pool size
  • resources: FD limit, memory RSS, event loop tasks
  • counters: requests, errors, timeouts
  • config: transport, timeouts
  • health: DDGS, Playwright availability
  • metrics: latency percentiles (p50/p95/p99), subtask stats

Privileged status server: A separate Starlette-based HTTP server runs on a daemon thread (default port 8081), providing /status, /healthz, and /tasks endpoints. This server remains responsive even under heavy load because it operates independently from the main event loop. Configure via WEB_STATUS_PORT (set to 0 to disable).

Diagnostic HTTP Routes

When running in HTTP transport mode (WEB_TRANSPORT=streamable-http):

  • GET /healthz — Health check
  • GET /readyz — Readiness probe (200/503)
  • GET /status — Same JSON as web_mcp_status
  • GET /tasks — Active event loop tasks (debugging)
  • GET /memory — Memory breakdown (RSS, arenas)
  • GET /error-types — Error hierarchy for client introspection

Configuration

Environment Variables

Variable Default Description
WEB_TIMEOUT_TOTAL 30 Total timeout (sec)
WEB_TIMEOUT_CONNECT 5 Connect timeout (sec)
WEB_TIMEOUT_READ 25 Read timeout (sec)
WEB_MAX_RESULTS 5 Default search results
WEB_REQUEST_LIMIT 50 Max concurrent requests
WEB_MIN_INTERVAL 1.0 Throttle gap (sec)
WEB_RETRIES 2 Retry attempts
WEB_MAX_FETCH_CHARS 200000 Max fetch body length
WEB_MAX_DOWNLOAD_MB 50 Max download size
WEB_DEBUG false Debug logging
WEB_LOG_FORMAT text text or json
WEB_SESSION_ENABLED false Persistent cookies
WEB_TRANSPORT stdio stdio or streamable-http
WEB_HTTP_HOST 127.0.0.1 HTTP bind host
WEB_HTTP_PORT 8000 HTTP bind port
WEB_MCP_IDLE_LIFETIME 300 stdio idle timeout (sec), 0 to disable
WEB_DNS_RESOLVER system google, cloudflare, yandex, quad9, system
WEB_DNS_STRATEGY only_ipv4, only_ipv6, or dual-stack
WEB_PROXY HTTP proxy URL
WEB_SSL_VERIFY true TLS verification
WEB_SSL_PATH Extra CA certs
WEB_USER_AGENT Chrome 135 Custom UA
WEB_UA_ROTATION false Rotate UA pool
WEB_UA_LIST Custom UA list (||| separated)
WEB_GITHUB_TOKEN GitHub API token
WEB_REQUEST_TOKEN Default auth token
WEB_STATUS_PORT 8081 Privileged status server port (0 to disable)
WEB_MAX_URLS_PER_CALL 50 Maximum URLs per web_fetch call
WEB_FETCH_SUBTASK_TIMEOUT 15 Per-subtask timeout for fetch (sec)
WEB_SUBTASK_TIMEOUT 5 Per-subtask timeout for generic ops (sec)
WEB_SUBTASK_RETRIES 3 Retry attempts for subtasks
WEB_STREAM_CHUNK_SIZE 65536 HTTP stream chunk size (bytes)
WEB_SESSION_IDLE_TIMEOUT 5 Session idle timeout (sec)
WEB_SESSION_CLEANUP_INTERVAL 5 Session cleanup interval (sec)
WEB_MAX_LATENCY_SAMPLES 20000 Maximum latency samples to track
WEB_METRICS_MAX_SAMPLES 1000 Maximum metrics samples to track
WEB_SEARCH_CACHE_TTL 300 Search cache TTL (sec)
WEB_SEARCH_CACHE_MAXSIZE 100 Search cache max entries
WEB_FETCH_CACHE_TTL 60 Fetch cache TTL (sec)
WEB_FETCH_CACHE_MAXSIZE 50 Fetch cache max entries
WEB_API_CACHE_TTL 300 API cache TTL (sec)
WEB_API_CACHE_MAXSIZE 200 API cache max entries

Session Persistence

When use_session=True or WEB_SESSION_ENABLED=1:

  • web_fetch / web_download: reuse per-session niquests.AsyncSession
  • web_fetch_browser: reuse per-session Playwright BrowserContext

Sessions are scoped to the MCP session (FastMCP Context), not global.


HTTP Mode

export WEB_TRANSPORT=streamable-http
export WEB_HTTP_HOST=127.0.0.1
export WEB_HTTP_PORT=8000
www-search-mcp

Operational endpoints:

  • GET /healthz{ "status": "ok" }
  • GET /readyz — readiness probe (200/503)
  • GET /status — same JSON as web_mcp_status
  • GET /error-types — error hierarchy

MCP endpoint:

  • POST /mcp — streamable-http

Idle Timeout (stdio)

stdio process auto-terminates after WEB_MCP_IDLE_LIFETIME seconds of inactivity (default 5 min). Set to 0 to disable.


Development

# Setup
git clone https://github.com/naifs/www-search-mcp.git
cd www-search-mcp
uv sync --all-groups

# Format + lint
uv run ruff format src/ tests/
uv run ruff check src/ tests/

# Type check
uv run ty check src/

# Tests
uv run pytest tests/ -q          # parallel (default)
uv run pytest tests/ -q -n0      # sequential

# Security
uv run bandit -r src/

# Build + install
rm -rf dist/
uv build
uv tool install --force dist/*.whl

Troubleshooting

Problem Fix
uv not found Install from astral.sh
Browser not found uv run python -m playwright install chromium
GitHub rate limit Set WEB_GITHUB_TOKEN
Binary in web_fetch Use web_download instead
Tools not showing Check config JSON, reload client, enable WEB_DEBUG

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

www_search_mcp-1.3.2.tar.gz (178.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

www_search_mcp-1.3.2-py3-none-any.whl (76.7 kB view details)

Uploaded Python 3

File details

Details for the file www_search_mcp-1.3.2.tar.gz.

File metadata

  • Download URL: www_search_mcp-1.3.2.tar.gz
  • Upload date:
  • Size: 178.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for www_search_mcp-1.3.2.tar.gz
Algorithm Hash digest
SHA256 63023ad012c56fb31b32254859ad88afd03d8fdb50b12a1e3288624e2603d402
MD5 612374f9d2ed817dd9c6329e6c9be7ee
BLAKE2b-256 8879a7db663869d483d2a5884b01327c1c5dc0be9ebce19457883796f19cf927

See more details on using hashes here.

Provenance

The following attestation bundles were made for www_search_mcp-1.3.2.tar.gz:

Publisher: ci-cd.yml on Naifs/www-search-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file www_search_mcp-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: www_search_mcp-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 76.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for www_search_mcp-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1d9fd3c3a8788bb2f400e3fba925b75d85b0d3181ddfb0322d7154d12c1a9b0f
MD5 a0f2f2d32c05ad9e90bf19545ae7974b
BLAKE2b-256 90c649f1d631312c989076412a9ab81d77112640f9a3f2440cc34a246365d686

See more details on using hashes here.

Provenance

The following attestation bundles were made for www_search_mcp-1.3.2-py3-none-any.whl:

Publisher: ci-cd.yml on Naifs/www-search-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page