MCP Web Search service for AI ecosystem
Project description
MCP Web Search
MCP service for web search and content extraction, implemented via Model Context Protocol (FastMCP).
Features
Three MCP tools:
search— web search with smart filtering and fallback chaincontent— clean text extraction from URLs with SSRF protectionwebfetch— agent-based search via LangGraph StateGraph + LLM-as-Judgellm_health— LLM model health status in failover chain
Architecture
FastMCP (primary server)
├── search tool → DuckDuckGo + fallback chain + smart filtering
├── content tool → Trafilatura + SSRF protection + cache
└── webfetch tool → LangGraph StateGraph (8 nodes) + LLM-as-Judge
Installation
# Clone the repository
git clone https://github.com/M0M0S/mcp-webs.git
cd mcp-webs
# Install dependencies
uv sync
# Configure environment variables
cp .env.example .env
# fill .env (LLM_API_KEY, LLM_BASE_URL, etc.)
Usage
Start MCP Server
uv run python -m app.main
Connect to Claude Desktop (example)
Add to claude_desktop_config.json:
{
"mcpServers": {
"web-search": {
"command": "uv",
"args": ["run", "python", "-m", "app.main"],
"env": {
"LLM_API_KEY": "your-key",
"LLM_BASE_URL": "https://api.openai.com/v1"
}
}
}
}
MCP Tools
| Tool | Description | Parameters |
|---|---|---|
search |
Web search with fallback chain | query, max_results, provider |
content |
Extract text content from URL | url, token_limit |
webfetch |
Agent-based search via LangGraph | query, max_concurrent |
Authorization (MCP_ENCRYPTION_KEY)
When MCP_ENCRYPTION_KEY is configured, the server enables user-based authorization:
API key format
API keys are issued via the user_manage tool. The token presented to the server has the format:
key_<key_id>
where <key_id> is a unique identifier (e.g. key_abc123def456). The raw key is delivered one-time only during user creation — it is never stored in the database or logs after delivery.
Configuration
| Variable | Required | Description |
|---|---|---|
MCP_ENCRYPTION_KEY |
Yes (for auth) | Fernet encryption key (44-char base64, 32 bytes). Validated at startup. |
MCP_ENCRYPTION_KEY_BACKUP |
No | Optional backup Fernet key for recovery after primary key loss. |
Rate limits (defaults per user)
| Tier | Default | Redis TTL |
|---|---|---|
| Daily | 100 | 86400s |
| Weekly | 500 | 604800s |
| Monthly | 2000 | 2592000s |
Rate limits are configurable per user via user_manage update_limits.
Token cost tracking
Per-user LLM token consumption is tracked per tier (daily/weekly/monthly) for billing and quota visibility:
- Input tokens — query length + context
- Output tokens — response + extracted content
- Token limits default to unlimited (NULL) — configurable via
user_manage update_token_limits - Token cost tracking is informational only (warning on limit exceeded, not hard block)
- Rate limits enforce actual usage (hard block on limit exceeded)
user_manage tool (admin scope required)
| Action | Parameters | Output |
|---|---|---|
create |
name, rate_limits (opt), token_limits (opt) |
user_id, key_id, raw key (one-time) |
list |
status filter, page, page_size |
Paginated user list |
revoke |
user_id (confirmation) |
Status → revoked, Redis cache cleared |
rotate_key |
user_id (confirmation) |
New key_id, raw key (one-time), old key revoked |
check_limits |
user_id |
Current usage per tier (rate + token cost) |
check_token_usage |
user_id |
Current token usage per tier (input/output/total) |
update_limits |
user_id, new rate limits |
Updated config |
update_token_limits |
user_id, new token limits |
Updated config |
Audit Log Examples
Operators can reference these structured log entries for troubleshooting and compliance:
User creation:
2026-05-23T10:15:30Z INFO user_created user_id=a1b2c3d4e5f6... user_name=api-client-01 rate_limits={"daily":100,"weekly":500,"monthly":2000} actor=admin
Key rotation:
2026-05-23T14:32:01Z INFO key_rotated user_id=a1b2c3d4e5f6... user_name=api-client-01 key_version=2 actor=admin
Rate limit update:
2026-05-23T14:32:01Z INFO limits_updated user_id=a1b2c3d4e5f6... user_name=api-client-01 daily=200 weekly=1000 monthly=5000 actor=admin
Token limit update:
2026-05-23T14:32:02Z INFO token_limits_updated user_id=a1b2c3d4e5f6... user_name=api-client-01 daily=5000000 weekly=25000000 monthly=100000000 actor=admin
Invalid token (revoked key used):
2026-05-23T14:35:17Z WARN invalid_token user_id=a1b2c3d4e5f6... key_id=key_old_revoked tool_name=search result=denied timestamp=2026-05-23T14:35:17Z
Rate limit exceeded:
2026-05-23T15:01:42Z WARN rate_limit_exceeded user_id=f6e5d4c3b2a1... key_id=key_7f8g9h0i1j2k tool_name=content result=denied timestamp=2026-05-23T15:01:42Z
User disabled:
2026-05-23T16:20:05Z WARN user_disabled user_id=c3b2a1f6e5d4... key_id=key_revoked_01 tool_name=webfetch result=denied timestamp=2026-05-23T16:20:05Z
Note: token cost data (input/output tokens) is NOT included in audit logs — audit logs contain event metadata only.
Development
Project Standards
- CONTRIBUTING.md — how to contribute, process, standards
- SECURITY.md — security policy, SSRF, secret handling
- docs/standards/ — detailed standards reference
Commands
# Tests
uv run pytest tests/ -v
# Coverage
uv run pytest tests/ --cov=app --cov-report=term-missing
# Linting
uv run ruff check app/ tests/
# Formatting
uv run ruff format app/ tests/
# Type checking
uv run mypy app/
# Security scan
uv run bandit -r app/
Configuration
Environment variables documented in docs/standards/configuration.md.
Search Logic
search — search with fallback chain:
- Caching (Redis cache-aside)
- DuckDuckGo → SearxNG → Tavily → Google (fallback chain)
- Smart filtering (SEO spam, clickbait, blacklist)
- Result caching
content — content extraction:
- SSRF protection (whitelist + private IP check)
- Trafilatura → readability-lxml → bs4 (fallback chain)
- HTML sanitization (bleach)
- Caching (TTL: 24h)
webfetch — agent-based search:
- Stage 1: Generate queries via LLM
- Stage 2: Parallel searches (6 concurrent)
- Stage 3: Select URLs for extraction
- Stage 4: Judge URLs (LLM-as-Judge, threshold ≥0.85)
- Stage 5: Fetch content (Trafilatura)
- Stage 6: Generate features (Pydantic models)
- Stage 7: Judge Features (threshold ≥0.92)
- Fallback: Simple search on agent failure
Prometheus Metrics
Implemented metrics (via app/core/metrics.py):
| Metric | Type | Description |
|---|---|---|
provider_search_total |
Counter | Search attempts per provider |
provider_search_failure_total |
Counter | Failed searches per provider |
provider_health_score |
Gauge | Provider health (0.0–1.0) |
provider_chain_position |
Gauge | Provider position in fallback chain |
llm_failover_total |
Counter | LLM failover events (from→to model) |
llm_failover_duration_seconds |
Histogram | Failover duration |
llm_model_health_score |
Gauge | LLM model health (0.0–1.0) |
llm_active_model_index |
Gauge | Active LLM model index |
webfetch_checkpoint_save_total |
Counter | WebFetch checkpoint saves |
webfetch_checkpoint_resume_total |
Counter | WebFetch checkpoint resumes |
webfetch_checkpoint_size_bytes |
Histogram | Checkpoint payload size |
webfetch_active_checkpoints |
Gauge | Active checkpoints per tenant |
cache_ttl_distribution_seconds |
Histogram | Cache TTL distribution |
cache_stale_invalidations_total |
Counter | Cache stale invalidations |
cache_freshness_avg |
Gauge | Average cache freshness |
knowledge_graph_concepts_count |
Gauge | KG concepts count |
knowledge_graph_terms_count |
Gauge | KG related terms count |
kg_expansion_applied_total |
Counter | KG expansion events |
kg_enriched_concepts_total |
Counter | KG enriched concepts |
See Also
- CHANGELOG.md — version history
- pyproject.toml — dependencies and configuration
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_webs-1.1.0.tar.gz.
File metadata
- Download URL: mcp_webs-1.1.0.tar.gz
- Upload date:
- Size: 393.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2684a20ba0cf7e961a5e847ff18dfb2598b5d7cf78753e4068c4b384eb2e7662
|
|
| MD5 |
1fb953d39bb12a7e82353db6f66aae00
|
|
| BLAKE2b-256 |
59f8d4ce42dd8ec36832bb798a056598b2f010ea92b07fa9dda01427009ff392
|
File details
Details for the file mcp_webs-1.1.0-py3-none-any.whl.
File metadata
- Download URL: mcp_webs-1.1.0-py3-none-any.whl
- Upload date:
- Size: 97.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b6610411bfaaee3559fee334ecc82dac5e9f4cc68d80486fb66c125b2194ec1
|
|
| MD5 |
21c6f4becb0e3e27af73a8a9520e6a5e
|
|
| BLAKE2b-256 |
ac022de9bcf44f33504917d1ab2964d56709090cc3b15932fd3226a6975396e9
|