Fully local MCP server and CLI for web research
Project description
SourceWeave Web Search
Search-first MCP server and CLI for web research.
[!NOTE]
sourceweave-search-mcpis the default local entrypoint. When explicitSOURCEWEAVE_SEARCH_*endpoint variables are absent, it discovers or starts the local Docker-backed stack automatically. If you already run the services yourself, set explicit endpoints and it will use them instead.
Overview • Getting started • Managed local runtime • MCP client setup • CLI • Container deployments • OpenWebUI • Runtime configuration • Development
Overview
SourceWeave Web Search gives MCP clients a compact three-tool contract for web research:
search_web(query, domains?, urls?, effort?)discovers sources and returns compact results with stablepage_idhandles.read_pages(page_ids, focus?)reads stored pages bypage_id.read_urls(urls, focus?)reads direct URLs without searching first.
It combines:
| Component | Role |
|---|---|
| SearXNG | Search discovery |
| Crawl4AI | Clean HTML extraction |
| Redis or Valkey | Persisted page cache and page_id store |
| MarkItDown | Document conversion for PDFs and other supported files |
Getting started
Requirements
- Python
3.12+ - Docker with Compose support for the default managed local runtime
- Explicit
SOURCEWEAVE_SEARCH_*endpoints only if you want hosted or self-managed services
Managed local runtime
Run the server from the published package:
uvx --from sourceweave-web-search sourceweave-search-mcp
Or start the MCP server over HTTP:
uvx --from sourceweave-web-search sourceweave-search-mcp \
--transport streamable-http \
--host 127.0.0.1 \
--port 8000
When no endpoint env vars are set, sourceweave-search-mcp:
| Mode | What happens |
|---|---|
| Managed stack found | Join the existing SourceWeave-managed stack for the current runtime state directory |
| Healthy external stack found | Reuse the canonical local ports 19080, 19235, and 16379 without ownership |
| No reusable stack | Start and supervise a Docker-backed stack on canonical or free local ports |
Managed state lives under ~/.sourceweave-local/managed-runtime. Multiple MCP processes on the same machine share one managed stack per state directory.
[!IMPORTANT] Managed runtime removes containers only when the last active SourceWeave-managed process exits. Named volumes are preserved, so cache data survives restarts. If the original owning process dies, a later process can recover the same stack from Docker project identity and persisted runtime state.
Explicit endpoint mode
If you already run SearXNG, Crawl4AI, and Redis or Valkey yourself, or want to point at hosted services, set explicit endpoints and the MCP entrypoint will bypass managed Docker startup:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp
Direct CLI
sourceweave-search runs the tool directly. Use it when the supporting services are already available or when you provide explicit endpoints. It does not start Docker.
sourceweave-search --query "python programming" --read-first-pages 2
sourceweave-search --read-url "https://packaging.python.org/en/latest/"
[!TIP] The direct CLI also accepts
--searxng-base-url,--crawl4ai-base-url, and--cache-redis-urloverrides.
MCP client setup
OpenCode
Example opencode.json / opencode.jsonc / ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"sourceweave": {
"type": "local",
"command": [
"uvx",
"--from",
"sourceweave-web-search",
"sourceweave-search-mcp"
],
"enabled": true,
"timeout": 300000
}
}
}
For a shared HTTP endpoint instead:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"sourceweave": {
"type": "remote",
"url": "http://127.0.0.1:18000/mcp",
"enabled": true,
"timeout": 300000
}
}
}
VS Code Copilot
Example .vscode/mcp.json:
{
"servers": {
"sourceweave": {
"type": "stdio",
"command": "uvx",
"args": [
"--from",
"sourceweave-web-search",
"sourceweave-search-mcp"
]
}
}
}
For a shared HTTP endpoint instead:
{
"servers": {
"sourceweave": {
"type": "http",
"url": "http://127.0.0.1:18000/mcp"
}
}
}
Claude Code
Example .mcp.json:
{
"mcpServers": {
"sourceweave": {
"type": "stdio",
"command": "uvx",
"args": [
"--from",
"sourceweave-web-search",
"sourceweave-search-mcp"
]
}
}
}
For a project-scoped shared config, place the same block in .mcp.json at the repo root.
CLI
The direct CLI is useful once the supporting services are already reachable. It gives you the same search-first workflow without the MCP wrapper.
sourceweave-search --query "react useEffect cleanup example" --read-first-page
sourceweave-search --query "HTTP overview" --domain developer.mozilla.org --read-first-page
sourceweave-search --read-url "https://packaging.python.org/en/latest/"
Container deployments
The managed local runtime is for host-side uvx or uv run launches. Containerized deployments still use explicit endpoint wiring.
- Image:
ghcr.io/mrnaqa/sourceweave-web-search-mcp - Repo-local compose entrypoint:
docker compose up -d --build mcp
Example container run:
docker run --rm -p 8000:8000 \
-e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
-e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
-e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest
OpenWebUI
This repo also ships a generated standalone OpenWebUI tool file at artifacts/sourceweave_web_search.py.
From a repo checkout, verify it is in sync with the canonical implementation:
uv run sourceweave-build-openwebui --check
Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path. The generated file rewrites the default endpoints to the repo-local compose service names so it matches the container deployment path out of the box.
Runtime configuration
Optional environment variables:
| Variable | Purpose |
|---|---|
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL |
SearXNG URL template. Must contain <query>. |
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL |
Crawl4AI base URL. |
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL |
Redis or Valkey URL used for caching. |
FASTMCP_HOST |
Host for sse or streamable-http transport. |
FASTMCP_PORT |
Port for sse or streamable-http transport. |
If the endpoint variables are unset, sourceweave-search-mcp defaults to managed local runtime.
- Canonical host endpoints remain the preferred defaults and the external-reuse probe targets.
- A SourceWeave-managed stack may use different free host ports when the canonical defaults are already occupied.
- Multiple MCP processes on the same machine share one managed stack per local runtime state directory.
Default endpoint values:
- SearXNG:
http://127.0.0.1:19080/search?format=json&q=<query> - Crawl4AI:
http://127.0.0.1:19235 - Redis:
redis://127.0.0.1:16379/2
Default preferred host ports for managed startup:
- SearXNG:
19080 - Crawl4AI:
19235 - Redis:
16379 - MCP:
8000when run directly withuvx;18000at/mcpwhen using the repo'smcpcompose service
Development
git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
uv sync --locked --group dev
uv run sourceweave-search-mcp
Useful checks:
uv run sourceweave-build-openwebui --check
uv run sourceweave-search-mcp --help
uv run pytest tests/test_config.py tests/test_packaging.py tests/test_tool.py tests/test_managed_runtime.py -m "not integration"
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sourceweave_web_search-0.5.0.tar.gz.
File metadata
- Download URL: sourceweave_web_search-0.5.0.tar.gz
- Upload date:
- Size: 38.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
de87aab57cd71de53b3778399236568284fc566ad785e3b48d7c021c76a78997
|
|
| MD5 |
c4ecc89cc400dddbc1cfec6c6b60ad3b
|
|
| BLAKE2b-256 |
ac151eb977e140488ec844db083cd9cf89365dd67576412f2a0fce7e4a7cdf73
|
Provenance
The following attestation bundles were made for sourceweave_web_search-0.5.0.tar.gz:
Publisher:
release.yml on MRNAQA/sourceweave-web-search
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sourceweave_web_search-0.5.0.tar.gz -
Subject digest:
de87aab57cd71de53b3778399236568284fc566ad785e3b48d7c021c76a78997 - Sigstore transparency entry: 1361710214
- Sigstore integration time:
-
Permalink:
MRNAQA/sourceweave-web-search@e151f5fd3218f0137232d9200e8e8062adb116be -
Branch / Tag:
refs/heads/main - Owner: https://github.com/MRNAQA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e151f5fd3218f0137232d9200e8e8062adb116be -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file sourceweave_web_search-0.5.0-py3-none-any.whl.
File metadata
- Download URL: sourceweave_web_search-0.5.0-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1aad33ef3ceb6b640bf2d5949bd17e98d5e74e7d8f735c0bbcff3c0c812b4b35
|
|
| MD5 |
154bff925a7c49c3573aa458b05d448a
|
|
| BLAKE2b-256 |
fd4e125d0a581a5935572a566346ebc02a7aeef241b8cc2707972b07ab024f3f
|
Provenance
The following attestation bundles were made for sourceweave_web_search-0.5.0-py3-none-any.whl:
Publisher:
release.yml on MRNAQA/sourceweave-web-search
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sourceweave_web_search-0.5.0-py3-none-any.whl -
Subject digest:
1aad33ef3ceb6b640bf2d5949bd17e98d5e74e7d8f735c0bbcff3c0c812b4b35 - Sigstore transparency entry: 1361710222
- Sigstore integration time:
-
Permalink:
MRNAQA/sourceweave-web-search@e151f5fd3218f0137232d9200e8e8062adb116be -
Branch / Tag:
refs/heads/main - Owner: https://github.com/MRNAQA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e151f5fd3218f0137232d9200e8e8062adb116be -
Trigger Event:
workflow_dispatch
-
Statement type: