Skip to main content

PowerSearch MCP helps AI agents search and retrieve content from the public web with fewer broken fetches and clean, AI-friendly outputs ready to cite.

Project description

PowerSearch MCP

Lint, unit test status Release status Publish status

Project status License Python Version

PyPi version PyPi downloads

PowerSearch MCP helps AI agents search and retrieve content from the public web with fewer broken fetches and clean, AI-friendly outputs ready to cite.

Feature Roadmap:

  • SearXNG-backed meta search with configurable engines, language, safe-search, and pagination
  • ✅ Strong anti-bot fetching implementation via Scrapling and Camoufox
  • ✅ Search response caching at the tool-level to memory, disk, and Redis storage backends
  • ✅ Automatic retries with exponential backoff for both search and fetch operations
  • ✅ AI Agent-friendly responses: HTML pages are converted to markdown automatically via Trafilatura
  • ✅ Support for STDIO and streaming HTTP transports
  • ✅ Health check endpoint for HTTP transport
  • ✅ Extensive configuration suitable for many deployment scenarios
  • ✅ Authentication support for both JWT and opaque tokens
  • ✅ Authorization support for embedded Eunomia policies
  • 🗓️ (Future) Auto summarization of search results via MCP sampling
  • 🗓️ (Future) Client selectable synchronous (current behavior) or asynchronous SEP-1686 execution for search / fetch tools
  • 🗓️ (Future) Containerization, publish public image
  • 🗓️ (Future) Prometheus metrics exporter
  • 🗓️ (Future) Helm chart

Setup

If you haven't already, go ahead and run make init to set up the Python virtual environment and dependencies.

Next, initialize Camoufox:

camoufox fetch

Finally, run a local instance of SearXNG.

docker run --rm -it \
    --name searxng-local \
    -p 127.0.0.1:9876:8080 \
    --tmpfs /etc/searxng:rw,noexec,nosuid,size=16m \
    --tmpfs /tmp:rw,noexec,nosuid,size=512m \
    --cap-drop=ALL \
    --security-opt=no-new-privileges:true \
    --env=SEARXNG_SETTINGS_PATH=/settings.yml \
    --volume=$(pwd)/searxng.yaml:/settings.yml:ro \
    searxng/searxng

Running the server

PowerSearch now relies entirely on the FastMCP CLI and the checked-in configuration files. Runtime behavior still comes from POWERSEARCH_ environment variables (or a .env file).

  • STDIO (default): fastmcp run fastmcp.json --skip-env --project . — best for Claude Desktop and Inspector.
  • Streamable HTTP example: fastmcp run fastmcp-http.json --skip-env --project . — binds to 0.0.0.0:8092/mcp with CORS enabled.
  • Override deployment settings at launch with flags (for example --transport stdio, --host 0.0.0.0, --port 8912, --path /custom). CLI flags override the deployment block in the chosen config.

Both configs bake in the runtime dependencies to make first-time installs predictable; uv will reuse the local project via --project . and editable so local edits take effect. The HTTP app still exposes a /health endpoint and honors all POWERSEARCH_ environment variables for search behavior.

To run the search backend in the background:

docker run -d \
    --name searxng-local \
    --pull=always \
    --restart unless-stopped \
    -p 127.0.0.1:9876:8080 \
    --tmpfs /etc/searxng:rw,noexec,nosuid,size=16m \
    --tmpfs /tmp:rw,noexec,nosuid,size=512m \
    --cap-drop=ALL \
    --security-opt=no-new-privileges:true \
    --health-cmd='python3 -c "import urllib.request; urllib.request.urlopen(\"http://127.0.0.1:8080/\", timeout=3).read(1)"' \
    --health-interval=10s \
    --health-timeout=3s \
    --health-retries=10 \
    --health-start-period=15s \
    --env SEARXNG_SETTINGS_PATH=/settings.yml \
    --volume "$(pwd)/searxng.yaml:/settings.yml:ro" \
    searxng/searxng

How Are Search Results Ranked?

SearXNG returns each hit with a score that already blends engine weight and position. PowerSearch keeps that score and applies two passes: a percentile cut and a top-K trim. By default it keeps results at or above the 75th percentile, then retains only the top 10. That combination aggressively drops weak hits while keeping a predictable result count.

If you set POWERSEARCH_FILTER_SCORE_PERCENTILE to None, the percentile cut is skipped and only the top-K pass runs. Increasing POWERSEARCH_FILTER_TOP_K widens the net but may slow things down if content fetching is enabled.

Content strategy matters too. With fetch, the tool will fetch each retained URL and run Trafilatura over it; higher K or looser filters mean more network work. With quick, PowerSearch leaves content as the SearXNG snippets, which is faster but less complete.

Configuration

PowerSearch reads environment variables with the POWERSEARCH_ prefix (also respected via a .env file). By design, configuration exists only as environment variables to make using the Power Search tool as simple as possible for AI agents.

Search Behavior

Setting What it does When to change
POWERSEARCH_BASE_URL SearXNG search endpoint (should end with /search). Point at your own SearXNG host or a different port.
POWERSEARCH_ENGINES Comma-separated SearXNG engines. Limit to a trusted subset (e.g., duckduckgo,bing).
POWERSEARCH_LANGUAGE IETF language tag for queries. Bias results toward a locale (e.g., en, fr).
POWERSEARCH_SAFE_SEARCH Safe-search level (0, 1, 2). Tweak content filtering; defaults to 1.
POWERSEARCH_MAX_PAGE How many result pages to request. Raise for broader coverage when latency is acceptable.
POWERSEARCH_FILTER_SCORE_PERCENTILE Drops results below a score percentile. Lower or disable (None) if you need long-tail hits.
POWERSEARCH_FILTER_TOP_K Keep only the top K after scoring. Increase for more results; decrease for faster downstream fetches.
POWERSEARCH_CONTENT_STRATEGY fetch pulls pages; quick uses SearXNG snippets only. Use quick when you cannot fetch pages or need speed.
POWERSEARCH_CONTENT_LIMIT Character cap per result. Raise to keep more text; set None to disable trimming.
POWERSEARCH_TIMEOUT_SEC Total budget for search + fetch. Increase on slow networks; decrease to fail fast.
POWERSEARCH_HTTP2 Enables HTTP/2 upstream. Turn on if your network and SearXNG support it.
POWERSEARCH_VERIFY TLS certificate verification. Disable only for trusted dev setups with self-signed certs.

Content Extraction Behavior

Setting What it does When to change
POWERSEARCH_TRAFILATURA_EXTRACTION_TIMEOUT Max seconds Trafilatura spends extracting (0 = no limit). Add a cap if extractions hang on heavy pages.
POWERSEARCH_TRAFILATURA_MIN_EXTRACTED_SIZE Minimum size of accepted text. Raise to drop ultra-short pages; lower if small blurbs matter.
POWERSEARCH_TRAFILATURA_MIN_DUPLCHECK_SIZE Minimum size for duplicate checking. Bump up to reduce near-duplicate fragments.
POWERSEARCH_TRAFILATURA_MAX_REPETITIONS Repetition cap for repeated blocks. Lower to aggressively prune boilerplate.
POWERSEARCH_TRAFILATURA_EXTENSIVE_DATE_SEARCH Enables extra date heuristics. Turn off for speed if dates are irrelevant.
POWERSEARCH_TRAFILATURA_INCLUDE_LINKS Keep hyperlinks in markdown output. Enable if you want inline links retained.
POWERSEARCH_TRAFILATURA_INCLUDE_IMAGES Keep image references. Enable when image context is important.
POWERSEARCH_TRAFILATURA_INCLUDE_TABLES Keep tables. Disable only if tables bloat token counts.
POWERSEARCH_TRAFILATURA_INCLUDE_COMMENTS Keep HTML comments. Rarely needed; enable for debugging scraped pages.
POWERSEARCH_TRAFILATURA_INCLUDE_FORMATTING Preserve formatting markup. Enable if you need bold/italic cues; off for terser text.
POWERSEARCH_TRAFILATURA_DEDUPLICATE Removes near-identical blocks. Disable only if de-duplication cuts useful repeated info.
POWERSEARCH_TRAFILATURA_FAVOR_PRECISION Prefers precision over recall. Turn off to capture more content at the expense of noise.

Middleware & Reliability

Setting What it does When to change
POWERSEARCH_LOG_LEVEL Logging level for middleware; falls back to FASTMCP_LOG_LEVEL when unset. Raise to DEBUG/INFO while troubleshooting; lower to WARNING/ERROR in production.
POWERSEARCH_INCLUDE_PAYLOADS Include full MCP request/response bodies in logs. Enable temporarily for debugging only; can expose user data.
POWERSEARCH_INCLUDE_PAYLOAD_LENGTH Log payload length alongside metadata. Pair with payload logging when sizes matter but full bodies are off.
POWERSEARCH_ESTIMATE_PAYLOAD_TOKENS Log approximate token counts (length // 4). Enable when monitoring token budgets.
POWERSEARCH_MAX_PAYLOAD_LENGTH Cap logged payload characters. Lower to reduce log volume; raise when debugging truncated bodies.
POWERSEARCH_ERRORHANDLING_TRACEBACK Include tracebacks in error responses. Enable only in non-production environments.
POWERSEARCH_ERRORHANDLING_TRANSFORM Convert exceptions into MCP-friendly error responses. Leave on unless you need raw exceptions for debugging.
POWERSEARCH_RETRY_RETRIES Max retry attempts applied by retry middleware. Increase for flaky upstreams; set to 0 to disable retries.
POWERSEARCH_RETRY_BASE_DELAY Initial delay between retries (seconds). Tune for backoff aggressiveness.
POWERSEARCH_RETRY_MAX_DELAY Upper bound on backoff delay (seconds). Prevent excessively long waits.
POWERSEARCH_RETRY_BACKOFF_MULTIPLIER Exponential backoff multiplier. Lower for gentler backoff; raise for faster escalation.
FASTMCP_DOCKET_URL Session docket store for Streamable HTTP (e.g., memory://, redis://host:port/db). Switch to Redis or another backend when you need persistent/distributed HTTP sessions.
FASTMCP_DOCKET_CONCURRENCY Max concurrent docket operations. Increase for higher HTTP session throughput; lower to limit resource use.

Caching

PowerSearch can cache tool responses (search and fetch_url) via FastMCP's response caching middleware. Caching is off by default.

Setting What it does When to change
POWERSEARCH_CACHE Storage backend selector: memory, null (no-op, good for tests), file:///path/to/dir, or redis://host:port/db. Empty/None disables caching. Enable for repeat queries or to avoid refetching the same URLs. Use memory for local dev, file:// for lightweight persistence, and redis:// for shared/distributed deployments.
POWERSEARCH_CACHE_TTL_SEC (alias: POWERSEARCH_CACHE_TTL_SECONDS) TTL for cached tool responses (seconds). Defaults to 3600. Shorten for fresher results; lengthen when upstream data changes rarely.

Authentication & Authorization

See docs/auth.md for full details.

Setting What it does When to change
FASTMCP_SERVER_AUTH Selects the FastMCP auth provider (e.g., fastmcp.server.auth.providers.auth0.Auth0Provider for interactive OAuth, fastmcp.server.auth.providers.jwt.JWTVerifier for headless JWT validation). Choose the provider that matches how your tokens are obtained (interactive vs headless).
FASTMCP_SERVER_AUTH_AUTH0_CONFIG_URL OIDC discovery URL for Auth0/Keycloak (scenario 1). Set when using the interactive OAuth flow so clients can discover the IdP.
FASTMCP_SERVER_AUTH_AUTH0_CLIENT_ID OAuth client ID registered for PowerSearch MCP. Provide when using Auth0/Keycloak OAuth.
FASTMCP_SERVER_AUTH_AUTH0_AUDIENCE API audience that tokens must target. Set to the audience configured in your IdP for PowerSearch MCP.
FASTMCP_SERVER_AUTH_AUTH0_CLIENT_SECRET OAuth client secret for the MCP server registration. Required for Auth0/Keycloak OAuth server-side flow.
FASTMCP_SERVER_AUTH_AUTH0_BASE_URL Public base URL of the MCP server (no path) for OAuth redirects. Set when using the interactive OAuth flow.
FASTMCP_SERVER_AUTH_JWT_JWKS_URI JWKS endpoint used to verify JWT signatures (scenario 2). Set for headless JWT validation when tokens are pre-issued.
FASTMCP_SERVER_AUTH_JWT_ISSUER Expected iss claim for JWTs. Match to your identity provider's issuer to block tokens from other issuers.
FASTMCP_SERVER_AUTH_JWT_AUDIENCE Expected aud claim for JWTs. Set to the audience your IdP issues for this server.
FASTMCP_SERVER_AUTH_JWT_REQUIRED_SCOPES Scopes that must appear on accepted JWTs. Use to enforce least privilege for headless JWT flows.
POWERSEARCH_AUTHZ_POLICY_PATH Path to the Eunomia JSON policy file; server refuses to start if set and missing. Provide when enabling authorization and point at the policy you want enforced.
POWERSEARCH_ENABLE_AUDIT_LOGGING Turns on Eunomia audit logging when authz middleware is enabled. Enable for compliance or incident review; leave off to reduce log volume.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

powersearch_mcp-0.2.0.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

powersearch_mcp-0.2.0-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file powersearch_mcp-0.2.0.tar.gz.

File metadata

  • Download URL: powersearch_mcp-0.2.0.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for powersearch_mcp-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0792ce451a988236e7e9f22ed37d7506758b05c1308cbc3d13d46035d3f80120
MD5 8eddffe9d1058e4a2df9fc1ec0594757
BLAKE2b-256 9ab0115c927dd8376b40eeecf9e6f8bcf803ed37d69c499b05c2a31134fbe35c

See more details on using hashes here.

File details

Details for the file powersearch_mcp-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: powersearch_mcp-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.21 {"installer":{"name":"uv","version":"0.9.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for powersearch_mcp-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b37400fe2fdcb43b98d5fb460f8bf768bf34d2c36fbd72444de053076aa7e1c8
MD5 fbdb0c85f29277210d32a7317ecdba83
BLAKE2b-256 bc03b3e729884c9bf9fcd339ed92e6336260c417ffd6775819facb407191f870

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page