Fully local MCP server and CLI for web research
Project description
SourceWeave Web Search
SourceWeave Web Search is a fully local MCP server and CLI for web research.
It uses SearXNG for discovery, Crawl4AI for cleaned page extraction, and Redis or Valkey as the canonical persisted page cache.
For most users, the setup is simple:
- run the supporting services locally in containers, or point at existing external endpoints
- start the MCP server with
uvx - connect your MCP client to the running server over
stdioor local HTTP
Key Features
- MCP server with
stdio,sse, andstreamable-httptransports - fully local web research workflow with source discovery and stable follow-up reads for MCP clients
- automatic document conversion for PDFs and other supported documents when detected
- lean MCP contract with
search_web,read_pages, andread_urls - publishable Python package, container image, and generated OpenWebUI artifact
- compatible with OpenCode, VS Code Copilot, and other MCP clients
Requirements
- Python
3.12+ - a reachable SearXNG endpoint
- a reachable Crawl4AI endpoint
- a reachable Redis or Valkey instance
Optional:
- Docker and Docker Compose for the repo-local stack
Recommended Local Deployment
Start the supporting services locally:
git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
cp .env.example .env
docker compose up -d redis crawl4ai searxng
Then start the MCP server from the published package with uvx and point it at those local endpoints:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp
For a local HTTP MCP endpoint instead of stdio:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp \
--transport streamable-http \
--host 127.0.0.1 \
--port 8000
You can also point the same uvx command at externally hosted SearXNG, Crawl4AI, and Redis or Valkey endpoints by changing the environment variables.
Installation Options
Python package
Published releases can be installed from PyPI:
pip install sourceweave-web-search
Or run directly without a global install:
uvx --from sourceweave-web-search sourceweave-search-mcp
uvx --from sourceweave-web-search sourceweave-search --query "python programming"
Repo checkout
For local development or source-based runs:
git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
uv sync --locked --group dev
uv run sourceweave-search-mcp
Container image
The release workflow can publish a container image to:
ghcr.io/mrnaqa/sourceweave-web-search-mcp
Example runtime:
docker run --rm -p 8000:8000 \
-e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
-e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
-e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest
Example docker compose recipe:
services:
redis:
image: valkey/valkey:9-alpine
command: ["redis-server", "--appendonly", "no"]
crawl4ai:
image: unclecode/crawl4ai:0.8.6
searxng:
image: searxng/searxng:2026.4.11-9e08a6771
sourceweave-mcp:
image: ghcr.io/mrnaqa/sourceweave-web-search-mcp:latest
depends_on:
- redis
- crawl4ai
- searxng
environment:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL: http://searxng:8080/search?format=json&q=<query>
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL: http://crawl4ai:11235
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL: redis://redis:6379/2
FASTMCP_HOST: 0.0.0.0
FASTMCP_PORT: 8000
ports:
- "8000:8000"
That gives you a local HTTP MCP endpoint at http://127.0.0.1:8000/mcp with the SourceWeave container linked to the supporting services by container name.
The repo's own docker compose up -d --build mcp path also builds and runs this same publishable image locally.
Runtime Configuration
Set these environment variables:
| Variable | Purpose |
|---|---|
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL |
SearXNG URL template. Must contain <query>. |
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL |
Crawl4AI base URL. |
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL |
Redis or Valkey URL used for caching. |
FASTMCP_HOST |
Host for sse or streamable-http transport. |
FASTMCP_PORT |
Port for sse or streamable-http transport. |
Example:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
sourceweave-search --query "python programming" --read-first-pages 2
Quick Start
The CLI is useful for smoke testing the runtime outside an MCP client.
Search and immediately read the first results:
sourceweave-search --query "python programming" --read-first-pages 2
Verified live examples from the repo-local stack:
sourceweave-search --read-url https://en.wikipedia.org/wiki/Comparison_of_HTTP_server_software ...returned cleaned page contentsourceweave-search --query 'HTTP overview' --domain developer.mozilla.org --read-first-page ...returned compact search results plus a focused page read
Constrain search to a specific host with --domain:
sourceweave-search \
--query "react useEffect cleanup example" \
--domain developer.mozilla.org \
--read-first-page
Read a direct URL without running search_web first:
sourceweave-search \
--read-url "https://packaging.python.org/en/latest/"
Read a document URL directly without extra flags:
sourceweave-search \
--query "guide pdf" \
--url "https://example.com/guide.pdf"
MCP Server
Run over stdio:
sourceweave-search-mcp
Run as a local HTTP endpoint:
sourceweave-search-mcp --transport streamable-http --host 127.0.0.1 --port 8000
What MCP Clients Get
MCP clients receive a lean three-tool contract:
search_web(query, domains?, urls?): discover relevant sources and get compact results with stablepage_idhandlesread_pages(page_ids, focus?): read stored pages bypage_idread_urls(urls, focus?): read one or more direct URLs without searching first
Public result shapes are intentionally small:
search_webreturnspage_id,url,title,summary, andkey_pointsread_pagesandread_urlsreturnpage_id,url,title, andcontentcontent_typeis only included when the content is not HTML, andtruncatedis only included when true
Human operators usually only need to know how to run the server and where to point the runtime endpoints. MCP clients handle the exact tool parameters.
MCP Client Setup
OpenCode
Example opencode.json / opencode.jsonc / ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"sourceweave": {
"type": "local",
"command": [
"uvx",
"--from",
"sourceweave-web-search",
"sourceweave-search-mcp"
],
"environment": {
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
},
"enabled": true,
"timeout": 30000
}
}
}
For a shared HTTP endpoint instead:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"sourceweave": {
"type": "remote",
"url": "http://127.0.0.1:18000/mcp",
"enabled": true,
"timeout": 30000
}
}
}
VS Code Copilot
Example .vscode/mcp.json:
{
"servers": {
"sourceweave": {
"type": "stdio",
"command": "uvx",
"args": [
"--from",
"sourceweave-web-search",
"sourceweave-search-mcp"
],
"env": {
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
}
}
}
}
For a shared HTTP endpoint instead:
{
"servers": {
"sourceweave": {
"type": "http",
"url": "http://127.0.0.1:18000/mcp"
}
}
}
OpenWebUI
This repo also ships a generated standalone OpenWebUI tool file at artifacts/sourceweave_web_search.py.
From a repo checkout, verify it is in sync with the canonical implementation:
uv run sourceweave-build-openwebui --check
Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path.
Defaults
Default host-side endpoints used by the package:
- SearXNG:
http://127.0.0.1:19080/search?format=json&q=<query> - Crawl4AI:
http://127.0.0.1:19235 - Redis:
redis://127.0.0.1:16379/2
Default repo-local ports:
- SearXNG:
19080 - Crawl4AI:
19235 - Redis:
16379 - MCP:
8000when run directly withuvx;18000at/mcpwhen using the repo'smcpcompose service
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sourceweave_web_search-0.3.0.tar.gz.
File metadata
- Download URL: sourceweave_web_search-0.3.0.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bc7c632596bfb8f5663d5d26d3136eaf757acf3c2bf43ebf4ce94f37e0ae87a
|
|
| MD5 |
efbbbd760b5476a160d16b821ed8dc2e
|
|
| BLAKE2b-256 |
ca9d6febfc983b5a4d05271fe048b060ea2e882d0cf67643e16070e0e4172b33
|
Provenance
The following attestation bundles were made for sourceweave_web_search-0.3.0.tar.gz:
Publisher:
release.yml on MRNAQA/sourceweave-web-search
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sourceweave_web_search-0.3.0.tar.gz -
Subject digest:
9bc7c632596bfb8f5663d5d26d3136eaf757acf3c2bf43ebf4ce94f37e0ae87a - Sigstore transparency entry: 1307117119
- Sigstore integration time:
-
Permalink:
MRNAQA/sourceweave-web-search@d6b769e7f2f55825a27a248306cc889179bb52e3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/MRNAQA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d6b769e7f2f55825a27a248306cc889179bb52e3 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file sourceweave_web_search-0.3.0-py3-none-any.whl.
File metadata
- Download URL: sourceweave_web_search-0.3.0-py3-none-any.whl
- Upload date:
- Size: 33.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8da4ca2384ede4f49efad58fb656290ba26fe7d33b1a4ff7c1773367c67a0e54
|
|
| MD5 |
7bdb411ec1d24a5006c0aaa30c8288b8
|
|
| BLAKE2b-256 |
1d57fac236407152a5e430d8751b1a7a78a0640c58d9ddafd1b78b5fba85e189
|
Provenance
The following attestation bundles were made for sourceweave_web_search-0.3.0-py3-none-any.whl:
Publisher:
release.yml on MRNAQA/sourceweave-web-search
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sourceweave_web_search-0.3.0-py3-none-any.whl -
Subject digest:
8da4ca2384ede4f49efad58fb656290ba26fe7d33b1a4ff7c1773367c67a0e54 - Sigstore transparency entry: 1307117216
- Sigstore integration time:
-
Permalink:
MRNAQA/sourceweave-web-search@d6b769e7f2f55825a27a248306cc889179bb52e3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/MRNAQA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d6b769e7f2f55825a27a248306cc889179bb52e3 -
Trigger Event:
workflow_dispatch
-
Statement type: