MCP server providing web search (DuckDuckGo), HTTP fetch, browser fetch (Playwright), and file download.
Project description
www-search-mcp
MCP (Model Context Protocol) server providing web search, HTTP fetch, browser-based fetch (Playwright), file download, and package search (PyPI, GitHub).
Gives AI assistants (Claude Desktop, Cursor, etc.) the ability to search the web, read web pages, download files, and discover Python packages.
This server is optimized for batching: tools accept either a single string or a list. Agents should prefer passing lists (multi-query / multi-URL) to reduce round-trips.
Project Structure
High-level layout:
src/www_search_mcp/server.py: MCP entry point (FastMCP creation, tool registration, shutdown hooks)src/www_search_mcp/info.py: single source of truth for tool docs +web_mcp_infosrc/www_search_mcp/tools/: one file per MCP tool implementationsrc/www_search_mcp/search.py: DuckDuckGo search + GitHub/PyPI API logicsrc/www_search_mcp/utils/: shared reusable building blocks (no MCP wiring)
utils/ modules (shared code):
utils/http_async.py: sharedniquests.AsyncSession, DNS resolver preset, global outbound request semaphore (WEB_REQUEST_LIMIT)utils/fetch_processing.py: response -> standardized payload helpers (fetch markdown, save-to-file, streamed download limits)utils/paths.py: URL/path validation and small normalizations (validate_url,resolve_path,normalize_fetch_div)utils/search_exec.py: common search execution logic (execute_search,FIELD_MAPS, clamping)utils/cache.py: TTL/LRU caches (sync + async-safe)utils/throttle.py: global throttling (sync + async)utils/html.py: HTML -> Markdown conversion helpers
Note: tools typically import from the www_search_mcp.utils facade, which re-exports the stable public helpers from these modules.
System Requirements
| Requirement | Version |
|---|---|
| uv | Install guide |
| Python | 3.10+ (managed automatically by uv) |
| Playwright | Chromium browser (installed via post-install script) |
Install uv
macOS / Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
Windows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
macOS (Homebrew):
brew install uv
Installation
Option 1: Run directly with uvx (recommended)
No clone needed. Runs from PyPI. Updates automatically on each run:
uvx www-search-mcp
MCP client config:
{
"mcpServers": {
"www-search-mcp": {
"command": "uvx",
"args": ["www-search-mcp"]
}
}
}
Option 2: Install as a uv tool
# From PyPI
uv tool install www-search-mcp
# After installation, the command is available globally:
www-search-mcp
To update or reinstall:
uv tool upgrade www-search-mcp
# or force reinstall latest:
uv tool install --force www-search-mcp@latest
MCP client config (global tool):
{
"mcpServers": {
"www-search-mcp": {
"command": "www-search-mcp"
}
}
}
Option 3: Run from local source (for development)
git clone https://github.com/naifs/www-search-mcp.git
cd www-search-mcp
uv sync
uv run www-search-mcp
To update:
git pull && uv sync
MCP client config (local source):
{
"mcpServers": {
"www-search-mcp": {
"command": "uv",
"args": [
"run",
"--project",
"/absolute/path/to/www-search-mcp",
"www-search-mcp"
]
}
}
}
Option 4: Install from built wheel
cd /path/to/www-search-mcp
uv build
uv tool install dist/*.whl
To update:
uv build && uv tool install --force dist/*.whl
Note: Playwright Chromium is installed automatically on first use when a browser tool is called. If auto-install fails (e.g. no network), run manually:
uv run python -m playwright install chromium
Environment Variables
| Variable | Description | Default |
|---|---|---|
WEB_TIMEOUT |
Legacy total request timeout in seconds | 30 |
WEB_TIMEOUT_TOTAL |
Total request timeout in seconds (fallback: WEB_TIMEOUT) |
30 |
WEB_TIMEOUT_CONNECT |
Connect timeout in seconds | 5 |
WEB_TIMEOUT_READ |
Read timeout in seconds | 25 |
WEB_MAX_RESULTS |
Default max search results per query (1..25) | 5 |
WEB_MAX_FETCH_CHARS |
Max characters returned in fetch body | 200000 |
WEB_RETRIES |
Retry attempts on timeout/rate-limit (0..5) | 2 |
WEB_MIN_INTERVAL |
Minimum seconds between outbound requests (throttle) | 1.0 |
WEB_MAX_DOWNLOAD_MB |
Max size per downloaded file, in MB (1 MB = 1024·1024 B) | 50 |
WEB_DEBUG |
Enable debug logging (1/true/yes/on) |
false |
WEB_SESSION_ENABLED |
Enable persistent cookies/session by default | false |
WEB_TRANSPORT |
MCP transport: stdio|sse|streamable-http |
stdio |
WEB_HTTP_HOST |
Host for HTTP-based transports | 127.0.0.1 |
WEB_HTTP_PORT |
Port for HTTP-based transports | 8000 |
WEB_PROXY |
HTTP proxy URL (e.g. http://proxy:8080) |
— |
WEB_REQUEST_LIMIT |
Max concurrent outbound requests (global limit) | 50 |
WEB_DNS_RESOLVER |
DNS preset: google|cloudflare|yandex|quad9|system. Supports DoH, DoT, DoQ (Quad9 only), and plain DNS formats in the fallback chain |
google |
WEB_DNS_STRATEGY |
IPv4/IPv6 strategy: only_ipv4|only_ipv6 (default: both enabled) |
— |
WEB_SSL_VERIFY |
Verify TLS certificates (system trust store via truststore) |
true |
WEB_SSL_PATH |
Optional path to extra CA certs (PEM file or OpenSSL-hashed dir), loaded in addition to the system trust store | — |
WEB_USER_AGENT |
Custom User-Agent string for HTTP requests | Chrome 135 UA |
WEB_UA_ROTATION |
Enable User-Agent rotation across a pool of realistic desktop UAs | false |
WEB_UA_LIST |
Custom User-Agent list separated by ||| (optional) |
— |
WEB_GITHUB_TOKEN |
Optional GitHub token to increase API rate limits | — |
WEB_REQUEST_TOKEN |
Optional default Authorization token for web_request (used if headers.Authorization is not set). If token has no scheme, Bearer is assumed. |
— |
Notes:
- Default values are resolved from
src/www_search_mcp/config.py. WEB_TIMEOUTis a legacy alias for total timeout; preferWEB_TIMEOUT_TOTAL.WEB_DNS_RESOLVER=systemdisables DoH/DoT and uses the OS resolver (useful for corporate/filtered networks).WEB_DNS_STRATEGY=only_ipv4oronly_ipv6forces single-stack; omit or use empty value for dual-stack (default).
Provided Tools
Search Tools
-
web_search(general web pages)- Input:
queries: str | list[str]max_results: int(typical 5..25)
- Output:
status,query,result_countresults[]withtitle,url,snippet
- Note: safe search is always disabled
- Input:
-
web_search_images(image URLs)- Input:
queries: str | list[str]max_results: int(typical 5..25)
- Output:
status,query,result_countresults[]withtitle,image,url,thumbnail,height,width,source
- Note: safe search is always disabled
- Input:
-
web_search_github(GitHub repos via API)- Input:
queries: str | list[str]max_results: int(typical 5..25)
- Output:
status,query,result_countresults[]withtitle,url,description,stars,forks,language
- Note: uses GitHub REST API (no token needed, but rate-limited to ~10 req/min)
- Input:
-
web_search_pypi(PyPI packages)- Input:
queries: str | list[str]max_results: int(typical 5..25)
- Output:
status,query,result_countresults[]withname,version,summary,author,license,requires_python,url,repository,py_versions,dependencies
- Note: uses DuckDuckGo discovery + PyPI JSON API for enriched metadata
- Input:
Fetch & Download Tools
-
web_fetch(read known URL(s) as markdown)- Input:
urls: str | list[str](http/httpsonly)fetch_div: str = ""— optional CSS selector (e.g.article,.post-body)save_file: str = ""— optional absolute file path with extensionuse_session: bool = False— reuse cookies from previous requests
- Output:
status,http_status,url,truncated,title?,content_type?,body(orsaved_to/bytes_writtenwhensave_fileis used)
- Input:
-
web_fetch_browser(browser-rendered fetch for JS/login/captcha)- Input:
urls: str | list[str](http/httpsonly)fetch_div: str = ""— optional CSS selectorsave_file: str = ""— optional absolute file path with extensionheadless: bool = True— show browser window or run hiddenwait_seconds: int = 0— extra wait after page loaduse_session: bool = False— reuse browser cookies/context
- Output: same as
web_fetchplustitle - Use for JS-heavy pages or sites that block plain HTTP clients
- Input:
-
web_download(download bytes to disk)- Input:
urls: str | list[str](http/httpsonly)save_files: str | list[str]— required file path(s) with extensionuse_session: bool = False— reuse cookies from previous requests
- Output:
status,url,saved_to,bytes,content_type
- Input:
API Tool
web_request(call REST/GraphQL APIs, optionally load-test)- Input:
queries: dict | list[dict](one spec or a batch) - Shared spec fields:
type: 'rest' | 'graphql'method: 'GET'|'POST'|'PUT'|'PATCH'|'DELETE'|'HEAD'|'OPTIONS'url: str(http/https)headers: dict = {}(optional)requests: int = 1concurrency: int = 1(async workers, not OS threads)time: float = 00: send exactlyrequests * concurrencyrequests as fast as possible>0: best-effort pacing at ~requests * concurrencyrequests/sec fortimeseconds
- REST-specific:
body: dict|list|str|None- dict/list: sent as JSON
- str: sent as raw body
- GraphQL-specific:
- either
query: str(recommended) ORbody: str|dict - optional
variables: dict - optional
operationName: str(only needed when the GraphQL document contains multiple operations)
- either
- Auth:
- If
headers.Authorizationis not provided,WEB_REQUEST_TOKENis used as default.
- If
- Output:
- aggregated stats:
status_counts,http_status_counts,latency_mspercentiles - for small runs (
total_requests <= 3): includesresponse_samples(truncated)
- aggregated stats:
- Input:
Quick Verification
# Search the web
uv run python -c "import asyncio; from www_search_mcp.tools.web_search import web_search; r=asyncio.run(web_search('python mcp protocol', max_results=3)); print(r['status'], r['result_count'])"
# Fetch a page
uv run python -c "import asyncio; from www_search_mcp.tools.web_fetch import web_fetch; r=asyncio.run(web_fetch('https://example.com')); print(r['status'], r['http_status'])"
# Search GitHub
uv run python -c "import asyncio; from www_search_mcp.tools.web_search_github import web_search_github; r=asyncio.run(web_search_github('fastapi', max_results=3)); print(r['status'], r['result_count'])"
# Search PyPI
uv run python -c "import asyncio; from www_search_mcp.tools.web_search_pypi import web_search_pypi; r=asyncio.run(web_search_pypi('httpx', max_results=3)); print(r['status'], r['result_count'])"
# Call a REST API
uv run python -c "import asyncio; from www_search_mcp.tools.web_request import web_request; r=asyncio.run(web_request({'type':'rest','method':'POST','url':'https://httpbin.org/post','body':{'hello':'world'},'requests':1,'concurrency':1,'time':0})); print(r['status'], r['http_status_counts'])"
# Call a GraphQL API
uv run python -c "import asyncio; from www_search_mcp.tools.web_request import web_request; r=asyncio.run(web_request({'type':'graphql','method':'POST','url':'https://countries.trevorblades.com/','query':'{ __typename }','requests':1,'concurrency':1,'time':0})); print(r['status'], r['http_status_counts'])"
Local Development Commands
All commands are intended to be run from the repository root and use uv run (no manual venv activation).
Install / Sync
uv sync --all-groups
Format + Lint
uv run ruff format src/ tests/
uv run ruff check src/ tests/
Type Check
uv run ty check src/
Tests (xdist)
pytest-xdist is enabled by default via pyproject.toml (-n auto).
uv run python -m pytest tests/ -q
# Explicit override:
uv run python -m pytest tests/ -q -n auto
Security Scans
uv run bandit -r src/
uv run pip-audit
Build + Install Wheel Locally
rm -rf dist/
uv build
uv tool install --force dist/*.whl
Run MCP Server Locally
uv run www-search-mcp
Troubleshooting
uv not found
Install uv and reopen your terminal. See System Requirements.
Dependencies missing
uv sync
Playwright browser not found
uv run python -m playwright install chromium
GitHub API rate limit exceeded
The GitHub API allows ~10 requests/minute without authentication. To increase the limit, set a GitHub token:
export WEB_GITHUB_TOKEN=ghp_your_token_here
Binary content error in web_fetch
web_fetch rejects binary content (images, PDFs, etc.). Use web_download instead to save binary files to disk.
MCP tools not appearing in client
- Check that the MCP client config JSON is valid.
- Ensure the
--projectpath is absolute and correct. - Reload the MCP client after config changes.
- Check
WEB_DEBUG=truefor detailed logs.
Wrong project path in config
The --project argument must point to the root directory of www-search-mcp (where pyproject.toml is located), not to the src/ subdirectory.
Session & Timeouts
Session persistence (use_session / WEB_SESSION_ENABLED)
When session persistence is enabled, cookies are stored per MCP session (not globally) using the FastMCP-injected request Context.
web_fetch/web_download: reuse a per-sessionniquests.AsyncSession(cookie jar + connection pool).web_fetch_browser: reuse a per-session PlaywrightBrowserContext(browser cookies/storage).
If session persistence is disabled, tools use ephemeral clients/contexts.
Tool execution deadline
Tools are bounded by a tool-level deadline equal to WEB_TIMEOUT_TOTAL (fallback WEB_TIMEOUT).
This deadline is independent of HTTP connect/read timeouts and prevents long-running tool calls.
HTTP Mode (Optional)
By default the server runs via stdio transport (best for desktop clients).
To run with HTTP-based transports, set WEB_TRANSPORT:
export WEB_TRANSPORT=streamable-http
export WEB_HTTP_HOST=127.0.0.1
export WEB_HTTP_PORT=8000
uv run www-search-mcp
Operational endpoints (HTTP transports only):
GET /healthz->{ "status": "ok" }GET /readyz->{ "status": "ready", "version": "..." }
FastMCP endpoints (HTTP transports only):
streamable-http:POST /mcpsse:GET /sseandPOST /messages/
Note: FastMCP custom routes are not protected by server auth middleware by design; keep them non-sensitive.
MCP Resource
The server exposes a lightweight MCP resource:
www-search-mcp://server/info(JSON summary: name, version, tool count, safe env state)
For full tool docs and environment variable details, use the web_mcp_info tool.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file www_search_mcp-1.2.0.tar.gz.
File metadata
- Download URL: www_search_mcp-1.2.0.tar.gz
- Upload date:
- Size: 176.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bd73e1790658a71b07f85462a6a68d4197817cf5ef2588d4c9ffd787fe6a3ed
|
|
| MD5 |
8feea1975a120aad8f9234b42e0e9962
|
|
| BLAKE2b-256 |
5acf4233c6ff937d11922c111fc8e61bd61d4d0eeb3cc4e3a323a77efa0360bc
|
Provenance
The following attestation bundles were made for www_search_mcp-1.2.0.tar.gz:
Publisher:
ci-cd.yml on Naifs/www-search-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
www_search_mcp-1.2.0.tar.gz -
Subject digest:
3bd73e1790658a71b07f85462a6a68d4197817cf5ef2588d4c9ffd787fe6a3ed - Sigstore transparency entry: 1476969362
- Sigstore integration time:
-
Permalink:
Naifs/www-search-mcp@8e28f37401dd3f38ec7c12a120b9def1dc7a1269 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/Naifs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@8e28f37401dd3f38ec7c12a120b9def1dc7a1269 -
Trigger Event:
push
-
Statement type:
File details
Details for the file www_search_mcp-1.2.0-py3-none-any.whl.
File metadata
- Download URL: www_search_mcp-1.2.0-py3-none-any.whl
- Upload date:
- Size: 61.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a7ad7560bb8144000d206815e24a6fe317e06fcf1715a7575758a5ea843eb7b
|
|
| MD5 |
fbd2cba87e356f0221ddd554d9300685
|
|
| BLAKE2b-256 |
da2aabc1b437cfa06ffa765b4b3161fb26a9c24e48f7204ab457c8d9fbe68baf
|
Provenance
The following attestation bundles were made for www_search_mcp-1.2.0-py3-none-any.whl:
Publisher:
ci-cd.yml on Naifs/www-search-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
www_search_mcp-1.2.0-py3-none-any.whl -
Subject digest:
9a7ad7560bb8144000d206815e24a6fe317e06fcf1715a7575758a5ea843eb7b - Sigstore transparency entry: 1476969493
- Sigstore integration time:
-
Permalink:
Naifs/www-search-mcp@8e28f37401dd3f38ec7c12a120b9def1dc7a1269 -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/Naifs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci-cd.yml@8e28f37401dd3f38ec7c12a120b9def1dc7a1269 -
Trigger Event:
push
-
Statement type: