Skip to main content

Zero-dependency MCP server for unobstructed web reading: direct browser-header fetch past robots.txt/bot blocks, full-page HTML->Markdown with ad/boilerplate stripping, plus web search

Project description

NetLens

npm PyPI CI License: Apache 2.0 Python 3.10+

An MCP server for unobstructed web reading. It fetches any URL directly with browser-like headers — past robots.txt and naive bot blocks — and returns the full page as clean, ad-stripped Markdown, not a summary. Plus web search that returns real links. Zero dependencies: pure Python standard library.

Built for AI Agents

AI agents constantly hit pages their built-in tools can't read. NetLens fixes the three usual reasons a fetch comes back empty or useless:

Native web tools NetLens
Honor robots.txt, so crawler-disallowed pages return nothing Reads like the browser you'd open yourself — doesn't consult robots.txt
Blocked by header/User-Agent bot filters (403/202 to non-browser clients) Sends real browser headers via the system curl; commonly turns 403 → 200
Return a summary of the page Returns the full page content as Markdown
Leave ads, cookie banners, nav, and related-links chrome in the output Strips boilerplate locally so only the content reaches your context

It does not try to defeat JavaScript/Cloudflare challenge pages or CAPTCHAs — that's out of scope by design. When a page is a hard block, the HTTP status is surfaced honestly rather than faked.

Installation

npm (via npx):

{
  "mcpServers": {
    "netlens": {
      "command": "npx",
      "args": ["-y", "netlens-mcp"]
    }
  }
}

PyPI (via uvx):

{
  "mcpServers": {
    "netlens": {
      "command": "uvx",
      "args": ["netlens-mcp"]
    }
  }
}

Add either to your MCP client config (e.g. .mcp.json for Claude Code), then restart the session so the tools load.

Tools

web_search

Search the web and return real result links (title, URL, snippet), parsed locally — links, not summaries. Follow up with web_fetch to read a result.

Argument Type Description
query string (required) The search query
limit integer Max results (default 8)
engine string auto (default), duckduckgo, bing, mojeek, searxng

web_fetch

Fetch any page and return its full content as clean Markdown.

Argument Type Description
url string (required) URL to fetch (scheme optional; https assumed)
mode string article (main content only, default), full (whole body), raw (unconverted HTML)
max_chars integer Optional cap on returned characters (truncates with a note)

Workflow: web_search to find pages, then web_fetch to read them.

Search engines

Search is a pluggable, selectable registry. In auto mode NetLens tries engines in order and returns the first with results, so a rate-limit/challenge page on one falls through to the next.

Engine Notes
duckduckgo Default; html.duckduckgo.com endpoint
bing Automatic fallback
mojeek Independent index; automatic fallback
searxng Self-hosted/public SearXNG JSON API — set NETLENS_SEARXNG_URL

Pick per call with the engine argument, or set a default with NETLENS_SEARCH_ENGINE.

Configuration

Environment Variable Default Description
NETLENS_SEARCH_ENGINE auto Default search backend
NETLENS_SEARXNG_URL SearXNG base URL for engine=searxng

How it works

  • Direct fetch. Requests go straight to the target site via the system curl (better TLS/HTTP-2/compression, so it looks like a real browser), falling back to urllib. No third-party proxy or reader is involved.
  • Local conversion. HTML → Markdown happens in-process with a hand-rolled html.parser converter — headings, lists, links (relative URLs resolved), code blocks, and GFM tables with colspan/rowspan.
  • Boilerplate stripping. Ads, cookie/consent banners, nav, footers, sidebars, social/share and related/recommended widgets, and hidden elements are removed. In article mode NetLens also isolates the main content region (<main> / <article> / [role=main]).
  • Response charset is honored (from Content-Type or <meta>), so non-UTF-8 pages don't come back garbled.

Usage from the CLI

The server is also a plain script — handy for testing before a client loads it:

python -m netlens_mcp.server search "best bg3 starting class"
python -m netlens_mcp.server fetch  https://www.ign.com/wikis/baldurs-gate-3
python -m netlens_mcp.server full   https://example.com   # whole body
python -m netlens_mcp.server raw    https://example.com   # unconverted HTML

python -m netlens_mcp runs the stdio MCP server; python -m netlens_mcp.server <cmd> runs the CLI.

Development

pip install -e ".[dev]"
python -m pytest        # run the test suite
ruff check .            # lint

Requirements

  • Python 3.10+ (and the system curl, which ships with modern Windows/macOS/Linux; falls back to urllib if absent)

License

Apache License 2.0 — see LICENSE and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netlens_mcp-0.1.0.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

netlens_mcp-0.1.0-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file netlens_mcp-0.1.0.tar.gz.

File metadata

  • Download URL: netlens_mcp-0.1.0.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for netlens_mcp-0.1.0.tar.gz
Algorithm Hash digest
SHA256 117ccbe0058edbb2e0bdd95cb5a35c3b72127fd3988d35284c912cf4de2fa95f
MD5 ea3caab48e18bf8905ec236452700b2f
BLAKE2b-256 9353bda069c2fafeac7415e271e9aaf2537ce008bc2010b2c87bf35c1bb88a5d

See more details on using hashes here.

File details

Details for the file netlens_mcp-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: netlens_mcp-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for netlens_mcp-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 60e87c5d3b82d68eb6a5ef62f597e09aace2b2a0895c7f559ff7df645da385a9
MD5 63e9f715265f3b133d7ce145a4ee74d6
BLAKE2b-256 96ec409306fb0019f8feb8f7fef47503dd41d07074890bc6cb1a2a26f10473ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page