Skip to main content

Zero-dependency MCP server for unobstructed web reading: direct browser-header fetch past robots.txt/bot blocks, full-page HTML->Markdown with ad/boilerplate stripping, plus web search

Project description

NetLens

npm PyPI CI License: Apache 2.0 Python 3.10+

An MCP server for unobstructed web reading. It fetches any URL directly with browser-like headers — past robots.txt and naive bot blocks — and returns the full page as clean, ad-stripped Markdown, not a summary. Plus web search that returns real links. Zero dependencies: pure Python standard library.

Built for AI Agents

AI agents constantly hit pages their built-in tools can't read. NetLens fixes the three usual reasons a fetch comes back empty or useless:

Native web tools NetLens
Honor robots.txt, so crawler-disallowed pages return nothing Reads like the browser you'd open yourself — doesn't consult robots.txt
Blocked by header/User-Agent bot filters (403/202 to non-browser clients) Sends real browser headers via the system curl; commonly turns 403 → 200
Return a summary of the page Returns the full page content as Markdown
Leave ads, cookie banners, nav, and related-links chrome in the output Strips boilerplate locally so only the content reaches your context

It does not try to defeat JavaScript/Cloudflare challenge pages or CAPTCHAs — that's out of scope by design. When a page is a hard block, the HTTP status is surfaced honestly rather than faked.

Installation

npm (via npx):

{
  "mcpServers": {
    "netlens": {
      "command": "npx",
      "args": ["-y", "netlens-mcp"]
    }
  }
}

PyPI (via uvx):

{
  "mcpServers": {
    "netlens": {
      "command": "uvx",
      "args": ["netlens-mcp"]
    }
  }
}

Add either to your MCP client config (e.g. .mcp.json for Claude Code), then restart the session so the tools load.

Tools

web_search

Search the web and return real result links (title, URL, snippet), parsed locally — links, not summaries. Follow up with web_fetch to read a result.

Argument Type Description
query string (required) The search query
limit integer Max results (default 8)
engine string auto (default), duckduckgo, bing, mojeek, searxng

web_fetch

Fetch any page and return its full content as clean Markdown.

Argument Type Description
url string (required) URL to fetch (scheme optional; https assumed)
mode string article (main content only, default), full (whole body), raw (unconverted HTML)
max_chars integer Optional cap on returned characters (truncates with a note)

Workflow: web_search to find pages, then web_fetch to read them.

Search engines

Search is a pluggable, selectable registry. In auto mode NetLens tries engines in order and returns the first with results, so a rate-limit/challenge page on one falls through to the next.

Engine Notes
duckduckgo Default; html.duckduckgo.com endpoint
bing Automatic fallback
mojeek Independent index; automatic fallback
searxng Self-hosted/public SearXNG JSON API — set NETLENS_SEARXNG_URL

Pick per call with the engine argument, or set a default with NETLENS_SEARCH_ENGINE.

Configuration

Environment Variable Default Description
NETLENS_SEARCH_ENGINE auto Default search backend
NETLENS_SEARXNG_URL SearXNG base URL for engine=searxng

How it works

  • Direct fetch. Requests go straight to the target site via the system curl (better TLS/HTTP-2/compression, so it looks like a real browser), falling back to urllib. No third-party proxy or reader is involved.
  • Local conversion. HTML → Markdown happens in-process with a hand-rolled html.parser converter — headings, lists, links (relative URLs resolved), code blocks, and GFM tables with colspan/rowspan.
  • Boilerplate stripping. Ads, cookie/consent banners, nav, footers, sidebars, social/share and related/recommended widgets, and hidden elements are removed. In article mode NetLens also isolates the main content region (<main> / <article> / [role=main]).
  • Response charset is honored (from Content-Type or <meta>), so non-UTF-8 pages don't come back garbled.

Usage from the CLI

The server is also a plain script — handy for testing before a client loads it:

python -m netlens_mcp.server search "best bg3 starting class"
python -m netlens_mcp.server fetch  https://www.ign.com/wikis/baldurs-gate-3
python -m netlens_mcp.server full   https://example.com   # whole body
python -m netlens_mcp.server raw    https://example.com   # unconverted HTML

python -m netlens_mcp runs the stdio MCP server; python -m netlens_mcp.server <cmd> runs the CLI.

Development

pip install -e ".[dev]"
python -m pytest        # run the test suite
ruff check .            # lint

Requirements

  • Python 3.10+ (and the system curl, which ships with modern Windows/macOS/Linux; falls back to urllib if absent)

License

Apache License 2.0 — see LICENSE and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netlens_mcp-0.1.1.tar.gz (28.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

netlens_mcp-0.1.1-py3-none-any.whl (23.3 kB view details)

Uploaded Python 3

File details

Details for the file netlens_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: netlens_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 28.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for netlens_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bdb3db64782f31b6393f86ca3d759d3c0780236992af2019570f9fd9f5ea44ae
MD5 657e15ae53da129f42d23fa881bc97ff
BLAKE2b-256 0770897c8dd6cd42ad60a96595f62d8691908e199b2e7f2268eb7e9ed6f1710a

See more details on using hashes here.

File details

Details for the file netlens_mcp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: netlens_mcp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 23.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for netlens_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 09e1fc450a79bd93e087ed421c3c788a62b12e38784c7ab3b812a140ca9b9355
MD5 d2f2857da14e303e3660403db4c8e081
BLAKE2b-256 924d8adfebf3e20a4d7f3ae66fdb24647328a81c8d4700a3454f643a6fa97267

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page