MCP server and CLI for web search and page reading with SearXNG, Crawl4AI, and Redis
Project description
SourceWeave Web Search
SourceWeave Web Search is an MCP server and CLI for web search plus follow-up page reading.
It uses SearXNG for search, Crawl4AI for HTML extraction, and Redis or Valkey for caching.
For most users, the setup is simple:
- run the supporting services locally in containers, or point at existing external endpoints
- start the MCP server with
uvx - connect your MCP client to the running server over
stdioor local HTTP
Key Features
- MCP server with
stdio,sse, andstreamable-httptransports - lean search plus follow-up page reading for MCP clients
- explicit per-URL document conversion for PDFs and other supported documents
- focused reads, related-link limits, image metadata, and page-quality hints
- publishable Python package, container image, and generated OpenWebUI artifact
- compatible with OpenCode, VS Code Copilot, and other MCP clients
Requirements
- Python
3.12+ - a reachable SearXNG endpoint
- a reachable Crawl4AI endpoint
- a reachable Redis or Valkey instance
Optional:
- Docker and Docker Compose for the repo-local stack
Recommended Local Deployment
Start the supporting services locally:
git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
cp .env.example .env
docker compose up -d redis crawl4ai searxng
Then start the MCP server from the published package with uvx and point it at those local endpoints:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp
For a local HTTP MCP endpoint instead of stdio:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
uvx --from sourceweave-web-search sourceweave-search-mcp \
--transport streamable-http \
--host 127.0.0.1 \
--port 8000
You can also point the same uvx command at externally hosted SearXNG, Crawl4AI, and Redis or Valkey endpoints by changing the environment variables.
Installation Options
Python package
Published releases can be installed from PyPI:
pip install sourceweave-web-search
Or run directly without a global install:
uvx --from sourceweave-web-search sourceweave-search-mcp
uvx --from sourceweave-web-search sourceweave-search --query "python programming"
Repo checkout
For local development or source-based runs:
git clone https://github.com/MRNAQA/sourceweave-web-search.git
cd sourceweave-web-search
uv sync --locked --group dev
uv run sourceweave-search-mcp
Container image
The release workflow can publish a container image to:
ghcr.io/mrnaqa/sourceweave-web-search- optionally
docker.io/mrnaqa/sourceweave-web-searchwhen Docker Hub publishing is configured
Example runtime:
docker run --rm -p 8000:8000 \
-e SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://host.docker.internal:19080/search?format=json&q=<query>" \
-e SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://host.docker.internal:19235" \
-e SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://host.docker.internal:16379/2" \
ghcr.io/mrnaqa/sourceweave-web-search:latest
Runtime Configuration
Set these environment variables:
| Variable | Purpose |
|---|---|
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL |
SearXNG URL template. Must contain <query>. |
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL |
Crawl4AI base URL. |
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL |
Redis or Valkey URL used for caching. |
FASTMCP_HOST |
Host for sse or streamable-http transport. |
FASTMCP_PORT |
Port for sse or streamable-http transport. |
Example:
SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL="http://127.0.0.1:19080/search?format=json&q=<query>" \
SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL="http://127.0.0.1:19235" \
SOURCEWEAVE_SEARCH_CACHE_REDIS_URL="redis://127.0.0.1:16379/2" \
sourceweave-search --query "python programming" --read-first-pages 2
Quick Start
The CLI is useful for smoke testing the runtime outside an MCP client.
Search and immediately read the first results:
sourceweave-search --query "python programming" --read-first-pages 2
Read a discovered page and include stored related links:
sourceweave-search \
--query "react useEffect cleanup example" \
--read-first-page \
--related-links-limit 3
Force document conversion for an explicit URL:
sourceweave-search \
--query "guide pdf" \
--url '{"url": "https://example.com/guide.pdf", "convert_document": true}'
MCP Server
Run over stdio:
sourceweave-search-mcp
Run as a local HTTP endpoint:
sourceweave-search-mcp --transport streamable-http --host 127.0.0.1 --port 8000
What MCP Clients Get
MCP clients receive a simple two-step flow:
- a search step that returns compact results plus
page_idhandles - a follow-up page-read step that returns stored content, focused excerpts, related-link summaries, image metadata, and page-quality hints when relevant
Human operators usually only need to know how to run the server and where to point the runtime endpoints. MCP clients handle the exact tool parameters.
MCP Client Setup
OpenCode
Example opencode.json / opencode.jsonc / ~/.config/opencode/opencode.json:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"sourceweave": {
"type": "local",
"command": [
"uvx",
"--from",
"sourceweave-web-search",
"sourceweave-search-mcp"
],
"environment": {
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
},
"enabled": true,
"timeout": 30000
}
}
}
For a shared HTTP endpoint instead:
{
"$schema": "https://opencode.ai/config.json",
"mcp": {
"sourceweave": {
"type": "remote",
"url": "http://127.0.0.1:18000/mcp",
"enabled": true,
"timeout": 30000
}
}
}
VS Code Copilot
Example .vscode/mcp.json:
{
"servers": {
"sourceweave": {
"type": "stdio",
"command": "uvx",
"args": [
"--from",
"sourceweave-web-search",
"sourceweave-search-mcp"
],
"env": {
"SOURCEWEAVE_SEARCH_SEARXNG_BASE_URL": "http://127.0.0.1:19080/search?format=json&q=<query>",
"SOURCEWEAVE_SEARCH_CRAWL4AI_BASE_URL": "http://127.0.0.1:19235",
"SOURCEWEAVE_SEARCH_CACHE_REDIS_URL": "redis://127.0.0.1:16379/2"
}
}
}
}
For a shared HTTP endpoint instead:
{
"servers": {
"sourceweave": {
"type": "http",
"url": "http://127.0.0.1:18000/mcp"
}
}
}
Publishing
The manual release workflow at .github/workflows/release.yml accepts a changelog and can optionally:
- publish the wheel and sdist to PyPI
- publish the container image to GHCR
- mirror the container image to Docker Hub when Docker Hub credentials are configured
Releases always attach the built distributions and artifacts/sourceweave_web_search.py to the GitHub release.
For contributor setup and publishing requirements, see CONTRIBUTING.md.
OpenWebUI
This repo also ships a generated standalone OpenWebUI tool file at artifacts/sourceweave_web_search.py.
From a repo checkout, verify it is in sync with the canonical implementation:
uv run sourceweave-build-openwebui --check
Paste that artifact into OpenWebUI when you want the standalone tool-file deployment path.
Defaults
Default host-side endpoints used by the package:
- SearXNG:
http://127.0.0.1:19080/search?format=json&q=<query> - Crawl4AI:
http://127.0.0.1:19235 - Redis:
redis://127.0.0.1:16379/2
Default repo-local ports:
- SearXNG:
19080 - Crawl4AI:
19235 - Redis:
16379 - MCP:
8000when run directly withuvx;18000at/mcpwhen using the repo'smcpcompose service
Contributing
See CONTRIBUTING.md for local development, verification, packaging notes, and release workflow details.
License
This project is licensed under the MIT License. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sourceweave_web_search-0.2.0.tar.gz.
File metadata
- Download URL: sourceweave_web_search-0.2.0.tar.gz
- Upload date:
- Size: 25.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11d49757d8b19d7c89b7f2b5adf56d96c81f6a6f85eaead62ad3c7e387bfe5f4
|
|
| MD5 |
3096edf3b0647dc27c4389b6d3b425f1
|
|
| BLAKE2b-256 |
8c695923ba7bf5ef90c6cd5dbb5f8315f3d608bf0e153bf31bdb8299c1a9e917
|
Provenance
The following attestation bundles were made for sourceweave_web_search-0.2.0.tar.gz:
Publisher:
release.yml on MRNAQA/sourceweave-web-search
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sourceweave_web_search-0.2.0.tar.gz -
Subject digest:
11d49757d8b19d7c89b7f2b5adf56d96c81f6a6f85eaead62ad3c7e387bfe5f4 - Sigstore transparency entry: 1283263125
- Sigstore integration time:
-
Permalink:
MRNAQA/sourceweave-web-search@b9f171c3db77cb1c073120f1f630143eb57b8166 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/MRNAQA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b9f171c3db77cb1c073120f1f630143eb57b8166 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file sourceweave_web_search-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sourceweave_web_search-0.2.0-py3-none-any.whl
- Upload date:
- Size: 29.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5289639034e077cab79fa94c025c11d9f4325e627714906898f5ddcc8dc2700
|
|
| MD5 |
f4c85703bef43ebed2ef186534e4877f
|
|
| BLAKE2b-256 |
1dd618123dad65792e11e988d619854bb53896ca2f58675020070de3e7149421
|
Provenance
The following attestation bundles were made for sourceweave_web_search-0.2.0-py3-none-any.whl:
Publisher:
release.yml on MRNAQA/sourceweave-web-search
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sourceweave_web_search-0.2.0-py3-none-any.whl -
Subject digest:
c5289639034e077cab79fa94c025c11d9f4325e627714906898f5ddcc8dc2700 - Sigstore transparency entry: 1283263539
- Sigstore integration time:
-
Permalink:
MRNAQA/sourceweave-web-search@b9f171c3db77cb1c073120f1f630143eb57b8166 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/MRNAQA
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b9f171c3db77cb1c073120f1f630143eb57b8166 -
Trigger Event:
workflow_dispatch
-
Statement type: