MCP server providing web search (DuckDuckGo), HTTP fetch, browser fetch (Playwright), and file download.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

www-search-mcp

MCP (Model Context Protocol) server providing web search, HTTP fetch, browser-based fetch (Playwright), file download, and package search (PyPI, GitHub).

Gives AI assistants (Claude Desktop, Cursor, etc.) the ability to search the web, read web pages, download files, and discover Python packages.

This server is optimized for batching: tools accept either a single string or a list. Agents should prefer passing lists (multi-query / multi-URL) to reduce round-trips.

Project Structure

High-level layout:

src/www_search_mcp/server.py: MCP entry point (FastMCP creation, tool registration, shutdown hooks)
src/www_search_mcp/info.py: single source of truth for tool docs + web_mcp_info
src/www_search_mcp/tools/: one file per MCP tool implementation
src/www_search_mcp/search.py: DuckDuckGo search + GitHub/PyPI API logic
src/www_search_mcp/utils/: shared reusable building blocks (no MCP wiring)

utils/ modules (shared code):

utils/http_async.py: shared niquests.AsyncSession, DNS resolver preset, global outbound request semaphore (WEB_REQUEST_LIMIT)
utils/fetch_processing.py: response -> standardized payload helpers (fetch markdown, save-to-file, streamed download limits)
utils/paths.py: URL/path validation and small normalizations (validate_url, resolve_path, normalize_fetch_div)
utils/search_exec.py: common search execution logic (execute_search, FIELD_MAPS, clamping)
utils/cache.py: TTL/LRU caches (sync + async-safe)
utils/throttle.py: global throttling (sync + async)
utils/html.py: HTML -> Markdown conversion helpers

Note: tools typically import from the www_search_mcp.utils facade, which re-exports the stable public helpers from these modules.

System Requirements

Requirement	Version
uv	Install guide
Python	3.10+ (managed automatically by `uv`)
Playwright	Chromium browser (installed via post-install script)

Install uv

macOS / Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

macOS (Homebrew):

brew install uv

Installation

Option 1: Run directly with `uvx` (recommended)

No clone needed. Runs from PyPI. Updates automatically on each run:

uvx www-search-mcp

MCP client config:

{
  "mcpServers": {
    "www-search-mcp": {
      "command": "uvx",
      "args": ["www-search-mcp"]
    }
  }
}

Option 2: Install as a `uv tool`

# From PyPI
uv tool install www-search-mcp

# After installation, the command is available globally:
www-search-mcp

To update or reinstall:

uv tool upgrade www-search-mcp
# or force reinstall latest:
uv tool install --force www-search-mcp@latest

MCP client config (global tool):

{
  "mcpServers": {
    "www-search-mcp": {
      "command": "www-search-mcp"
    }
  }
}

Option 3: Run from local source (for development)

git clone https://github.com/naifs/www-search-mcp.git
cd www-search-mcp
uv sync
uv run www-search-mcp

To update:

git pull && uv sync

MCP client config (local source):

{
  "mcpServers": {
    "www-search-mcp": {
      "command": "uv",
      "args": [
        "run",
        "--project",
        "/absolute/path/to/www-search-mcp",
        "www-search-mcp"
      ]
    }
  }
}

Option 4: Install from built wheel

cd /path/to/www-search-mcp
uv build
uv tool install dist/*.whl

To update:

uv build && uv tool install --force dist/*.whl

Note: Playwright Chromium is installed automatically on first use when a browser tool is called. If auto-install fails (e.g. no network), run manually:
uv run python -m playwright install chromium

Environment Variables

Variable	Description	Default
`WEB_TIMEOUT`	Legacy total request timeout in seconds	`30`
`WEB_TIMEOUT_TOTAL`	Total request timeout in seconds (fallback: `WEB_TIMEOUT`)	`30`
`WEB_TIMEOUT_CONNECT`	Connect timeout in seconds	`5`
`WEB_TIMEOUT_READ`	Read timeout in seconds	`25`
`WEB_MAX_RESULTS`	Default max search results per query (1..25)	`5`
`WEB_MAX_FETCH_CHARS`	Max characters returned in fetch body	`200000`
`WEB_RETRIES`	Retry attempts on timeout/rate-limit (0..5)	`2`
`WEB_MIN_INTERVAL`	Minimum seconds between outbound requests (throttle)	`1.0`
`WEB_MAX_DOWNLOAD_MB`	Max size per downloaded file, in MB (1 MB = 1024·1024 B)	`50`
`WEB_DEBUG`	Enable debug logging (`1`/`true`/`yes`/`on`)	`false`
`WEB_SESSION_ENABLED`	Enable persistent cookies/session by default	`false`
`WEB_TRANSPORT`	MCP transport: `stdio`\|`sse`\|`streamable-http`	`stdio`
`WEB_HTTP_HOST`	Host for HTTP-based transports	`127.0.0.1`
`WEB_HTTP_PORT`	Port for HTTP-based transports	`8000`
`WEB_PROXY`	HTTP proxy URL (e.g. `http://proxy:8080`)	—
`WEB_REQUEST_LIMIT`	Max concurrent outbound requests (global limit)	`50`
`WEB_DNS_RESOLVER`	DNS preset: `google`\|`cloudflare`\|`yandex`\|`quad9`\|`system`. Supports DoH, DoT, DoQ (Quad9 only), and plain DNS formats in the fallback chain	`google`
`WEB_DNS_STRATEGY`	IPv4/IPv6 strategy: `only_ipv4`\|`only_ipv6` (default: both enabled)	—
`WEB_SSL_VERIFY`	Verify TLS certificates (system trust store via `truststore`)	`true`
`WEB_SSL_PATH`	Optional path to extra CA certs (PEM file or OpenSSL-hashed dir), loaded in addition to the system trust store	—
`WEB_USER_AGENT`	Custom User-Agent string for HTTP requests	Chrome 135 UA
`WEB_UA_ROTATION`	Enable User-Agent rotation across a pool of realistic desktop UAs	`false`
`WEB_UA_LIST`	Custom User-Agent list separated by `\|\|\|` (optional)	—
`WEB_GITHUB_TOKEN`	Optional GitHub token to increase API rate limits	—
`WEB_REQUEST_TOKEN`	Optional default Authorization token for `web_request` (used if `headers.Authorization` is not set). If token has no scheme, `Bearer` is assumed.	—

Notes:

Default values are resolved from src/www_search_mcp/config.py.
WEB_TIMEOUT is a legacy alias for total timeout; prefer WEB_TIMEOUT_TOTAL.
WEB_DNS_RESOLVER=system disables DoH/DoT and uses the OS resolver (useful for corporate/filtered networks).
WEB_DNS_STRATEGY=only_ipv4 or only_ipv6 forces single-stack; omit or use empty value for dual-stack (default).

Provided Tools

Search Tools

web_search (general web pages)
- Input:
  - queries: str | list[str]
  - max_results: int (typical 5..25)
- Output:
  - status, query, result_count
  - results[] with title, url, snippet
- Note: safe search is always disabled
web_search_images (image URLs)
- Input:
  - queries: str | list[str]
  - max_results: int (typical 5..25)
- Output:
  - status, query, result_count
  - results[] with title, image, url, thumbnail, height, width, source
- Note: safe search is always disabled
web_search_github (GitHub repos via API)
- Input:
  - queries: str | list[str]
  - max_results: int (typical 5..25)
- Output:
  - status, query, result_count
  - results[] with title, url, description, stars, forks, language
- Note: uses GitHub REST API (no token needed, but rate-limited to ~10 req/min)
web_search_pypi (PyPI packages)
- Input:
  - queries: str | list[str]
  - max_results: int (typical 5..25)
- Output:
  - status, query, result_count
  - results[] with name, version, summary, author, license, requires_python, url, repository, py_versions, dependencies
- Note: uses DuckDuckGo discovery + PyPI JSON API for enriched metadata

Fetch & Download Tools

web_fetch (read known URL(s) as markdown)
- Input:
  - urls: str | list[str] (http/https only)
  - fetch_div: str = "" — optional CSS selector (e.g. article, .post-body)
  - save_file: str = "" — optional absolute file path with extension
  - use_session: bool = False — reuse cookies from previous requests
- Output:
  - status, http_status, url, truncated, title?, content_type?, body (or saved_to/bytes_written when save_file is used)
web_fetch_browser (browser-rendered fetch for JS/login/captcha)
- Input:
  - urls: str | list[str] (http/https only)
  - fetch_div: str = "" — optional CSS selector
  - save_file: str = "" — optional absolute file path with extension
  - headless: bool = True — show browser window or run hidden
  - wait_seconds: int = 0 — extra wait after page load
  - use_session: bool = False — reuse browser cookies/context
- Output: same as web_fetch plus title
- Use for JS-heavy pages or sites that block plain HTTP clients
web_download (download bytes to disk)
- Input:
  - urls: str | list[str] (http/https only)
  - save_files: str | list[str] — required file path(s) with extension
  - use_session: bool = False — reuse cookies from previous requests
- Output:
  - status, url, saved_to, bytes, content_type

API Tool

web_request (call REST/GraphQL APIs, optionally load-test)
- Input: queries: dict | list[dict] (one spec or a batch)
- Shared spec fields:
  - type: 'rest' | 'graphql'
  - method: 'GET'|'POST'|'PUT'|'PATCH'|'DELETE'|'HEAD'|'OPTIONS'
  - url: str (http/https)
  - headers: dict = {} (optional)
  - requests: int = 1
  - concurrency: int = 1 (async workers, not OS threads)
  - time: float = 0
    - 0: send exactly requests * concurrency requests as fast as possible
    - >0: best-effort pacing at ~requests * concurrency requests/sec for time seconds
- REST-specific:
  - body: dict|list|str|None
    - dict/list: sent as JSON
    - str: sent as raw body
- GraphQL-specific:
  - either query: str (recommended) OR body: str|dict
  - optional variables: dict
  - optional operationName: str (only needed when the GraphQL document contains multiple operations)
- Auth:
  - If headers.Authorization is not provided, WEB_REQUEST_TOKEN is used as default.
- Output:
  - aggregated stats: status_counts, http_status_counts, latency_ms percentiles
  - for small runs (total_requests <= 3): includes response_samples (truncated)

Quick Verification

# Search the web
uv run python -c "import asyncio; from www_search_mcp.tools.web_search import web_search; r=asyncio.run(web_search('python mcp protocol', max_results=3)); print(r['status'], r['result_count'])"

# Fetch a page
uv run python -c "import asyncio; from www_search_mcp.tools.web_fetch import web_fetch; r=asyncio.run(web_fetch('https://example.com')); print(r['status'], r['http_status'])"

# Search GitHub
uv run python -c "import asyncio; from www_search_mcp.tools.web_search_github import web_search_github; r=asyncio.run(web_search_github('fastapi', max_results=3)); print(r['status'], r['result_count'])"

# Search PyPI
uv run python -c "import asyncio; from www_search_mcp.tools.web_search_pypi import web_search_pypi; r=asyncio.run(web_search_pypi('httpx', max_results=3)); print(r['status'], r['result_count'])"

# Call a REST API
uv run python -c "import asyncio; from www_search_mcp.tools.web_request import web_request; r=asyncio.run(web_request({'type':'rest','method':'POST','url':'https://httpbin.org/post','body':{'hello':'world'},'requests':1,'concurrency':1,'time':0})); print(r['status'], r['http_status_counts'])"

# Call a GraphQL API
uv run python -c "import asyncio; from www_search_mcp.tools.web_request import web_request; r=asyncio.run(web_request({'type':'graphql','method':'POST','url':'https://countries.trevorblades.com/','query':'{ __typename }','requests':1,'concurrency':1,'time':0})); print(r['status'], r['http_status_counts'])"

Local Development Commands

All commands are intended to be run from the repository root and use uv run (no manual venv activation).

Install / Sync

uv sync --all-groups

Format + Lint

uv run ruff format src/ tests/
uv run ruff check src/ tests/

Type Check

uv run ty check src/

Tests (xdist)

pytest-xdist is enabled by default via pyproject.toml (-n auto).

uv run python -m pytest tests/ -q

# Explicit override:
uv run python -m pytest tests/ -q -n auto

Security Scans

uv run bandit -r src/
uv run pip-audit

Build + Install Wheel Locally

rm -rf dist/
uv build
uv tool install --force dist/*.whl

Run MCP Server Locally

uv run www-search-mcp

Troubleshooting

`uv` not found

Install uv and reopen your terminal. See System Requirements.

Dependencies missing

uv sync

Playwright browser not found

uv run python -m playwright install chromium

GitHub API rate limit exceeded

The GitHub API allows ~10 requests/minute without authentication. To increase the limit, set a GitHub token:

export WEB_GITHUB_TOKEN=ghp_your_token_here

Binary content error in `web_fetch`

web_fetch rejects binary content (images, PDFs, etc.). Use web_download instead to save binary files to disk.

MCP tools not appearing in client

Check that the MCP client config JSON is valid.
Ensure the --project path is absolute and correct.
Reload the MCP client after config changes.
Check WEB_DEBUG=true for detailed logs.

Wrong project path in config

The --project argument must point to the root directory of www-search-mcp (where pyproject.toml is located), not to the src/ subdirectory.

Session & Timeouts

Session persistence (`use_session` / `WEB_SESSION_ENABLED`)

When session persistence is enabled, cookies are stored per MCP session (not globally) using the FastMCP-injected request Context.

web_fetch / web_download: reuse a per-session niquests.AsyncSession (cookie jar + connection pool).
web_fetch_browser: reuse a per-session Playwright BrowserContext (browser cookies/storage).

If session persistence is disabled, tools use ephemeral clients/contexts.

Tool execution deadline

Tools are bounded by a tool-level deadline equal to WEB_TIMEOUT_TOTAL (fallback WEB_TIMEOUT). This deadline is independent of HTTP connect/read timeouts and prevents long-running tool calls.

HTTP Mode (Optional)

By default the server runs via stdio transport (best for desktop clients).

To run with HTTP-based transports, set WEB_TRANSPORT:

export WEB_TRANSPORT=streamable-http
export WEB_HTTP_HOST=127.0.0.1
export WEB_HTTP_PORT=8000
uv run www-search-mcp

Operational endpoints (HTTP transports only):

GET /healthz -> { "status": "ok" }
GET /readyz -> { "status": "ready", "version": "..." }

FastMCP endpoints (HTTP transports only):

streamable-http: POST /mcp
sse: GET /sse and POST /messages/

Note: FastMCP custom routes are not protected by server auth middleware by design; keep them non-sensitive.

MCP Resource

The server exposes a lightweight MCP resource:

www-search-mcp://server/info (JSON summary: name, version, tool count, safe env state)

For full tool docs and environment variable details, use the web_mcp_info tool.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

naifs

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.3.2

May 12, 2026

1.3.1

May 11, 2026

1.3.0

May 9, 2026

1.2.1

May 9, 2026

This version

1.2.0

May 8, 2026

1.1.2

May 6, 2026

1.1.1

May 6, 2026

1.1.0

May 5, 2026

1.0.3

May 4, 2026

1.0.2

May 1, 2026

1.0.1

May 1, 2026

1.0.0

Apr 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

www_search_mcp-1.2.0.tar.gz (176.5 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

www_search_mcp-1.2.0-py3-none-any.whl (61.2 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file www_search_mcp-1.2.0.tar.gz.

File metadata

Download URL: www_search_mcp-1.2.0.tar.gz
Upload date: May 8, 2026
Size: 176.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for www_search_mcp-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`3bd73e1790658a71b07f85462a6a68d4197817cf5ef2588d4c9ffd787fe6a3ed`
MD5	`8feea1975a120aad8f9234b42e0e9962`
BLAKE2b-256	`5acf4233c6ff937d11922c111fc8e61bd61d4d0eeb3cc4e3a323a77efa0360bc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for www_search_mcp-1.2.0.tar.gz:

Publisher: ci-cd.yml on Naifs/www-search-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: www_search_mcp-1.2.0.tar.gz
- Subject digest: 3bd73e1790658a71b07f85462a6a68d4197817cf5ef2588d4c9ffd787fe6a3ed
- Sigstore transparency entry: 1476969362
- Sigstore integration time: May 8, 2026
Source repository:
- Permalink: Naifs/www-search-mcp@8e28f37401dd3f38ec7c12a120b9def1dc7a1269
- Branch / Tag: refs/tags/v1.2.0
- Owner: https://github.com/Naifs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci-cd.yml@8e28f37401dd3f38ec7c12a120b9def1dc7a1269
- Trigger Event: push

File details

Details for the file www_search_mcp-1.2.0-py3-none-any.whl.

File metadata

Download URL: www_search_mcp-1.2.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 61.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for www_search_mcp-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9a7ad7560bb8144000d206815e24a6fe317e06fcf1715a7575758a5ea843eb7b`
MD5	`fbd2cba87e356f0221ddd554d9300685`
BLAKE2b-256	`da2aabc1b437cfa06ffa765b4b3161fb26a9c24e48f7204ab457c8d9fbe68baf`

See more details on using hashes here.

Provenance

The following attestation bundles were made for www_search_mcp-1.2.0-py3-none-any.whl:

Publisher: ci-cd.yml on Naifs/www-search-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: www_search_mcp-1.2.0-py3-none-any.whl
- Subject digest: 9a7ad7560bb8144000d206815e24a6fe317e06fcf1715a7575758a5ea843eb7b
- Sigstore transparency entry: 1476969493
- Sigstore integration time: May 8, 2026
Source repository:
- Permalink: Naifs/www-search-mcp@8e28f37401dd3f38ec7c12a120b9def1dc7a1269
- Branch / Tag: refs/tags/v1.2.0
- Owner: https://github.com/Naifs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci-cd.yml@8e28f37401dd3f38ec7c12a120b9def1dc7a1269
- Trigger Event: push

www-search-mcp 1.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

www-search-mcp

Project Structure

System Requirements

Install uv

Installation

Option 1: Run directly with uvx (recommended)

Option 2: Install as a uv tool

Option 3: Run from local source (for development)

Option 4: Install from built wheel

Environment Variables

Provided Tools

Search Tools

Fetch & Download Tools

API Tool

Quick Verification

Local Development Commands

Install / Sync

Format + Lint

Type Check

Tests (xdist)

Security Scans

Build + Install Wheel Locally

Run MCP Server Locally

Troubleshooting

uv not found

Dependencies missing

Playwright browser not found

GitHub API rate limit exceeded

Binary content error in web_fetch

MCP tools not appearing in client

Wrong project path in config

Session & Timeouts

Session persistence (use_session / WEB_SESSION_ENABLED)

Tool execution deadline

HTTP Mode (Optional)

MCP Resource

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Option 1: Run directly with `uvx` (recommended)

Option 2: Install as a `uv tool`

`uv` not found

Binary content error in `web_fetch`

Session persistence (`use_session` / `WEB_SESSION_ENABLED`)