Skip to main content

Zero-dependency MCP server for unobstructed web reading: direct browser-header fetch past robots.txt/bot blocks, full-page HTML->Markdown with ad/boilerplate stripping, plus web search

Project description

NetLens

npm PyPI CI License: Apache 2.0 Python 3.10+

An MCP server for unobstructed web reading. It fetches any URL directly with browser-like headers — past robots.txt and naive bot blocks — and returns the full page as clean, ad-stripped Markdown, not a summary. Plus web search that returns real links. Zero dependencies: pure Python standard library.

Built for AI Agents

AI agents constantly hit pages their built-in tools can't read. NetLens fixes the three usual reasons a fetch comes back empty or useless:

Native web tools NetLens
Honor robots.txt, so crawler-disallowed pages return nothing Reads like the browser you'd open yourself — doesn't consult robots.txt
Blocked by header/User-Agent bot filters (403/202 to non-browser clients) Sends real browser headers via the system curl; commonly turns 403 → 200
Return a summary of the page Returns the full page content as Markdown
Leave ads, cookie banners, nav, and related-links chrome in the output Strips boilerplate locally so only the content reaches your context

It does not try to defeat JavaScript/Cloudflare challenge pages or CAPTCHAs — that's out of scope by design. When a page is a hard block, the HTTP status is surfaced honestly rather than faked.

Installation

npm (via npx):

{
  "mcpServers": {
    "netlens": {
      "command": "npx",
      "args": ["-y", "netlens-mcp"]
    }
  }
}

PyPI (via uvx):

{
  "mcpServers": {
    "netlens": {
      "command": "uvx",
      "args": ["netlens-mcp"]
    }
  }
}

Add either to your MCP client config (e.g. .mcp.json for Claude Code), then restart the session so the tools load.

Tools

web_search

Search the web and return real result links (title, URL, snippet), parsed locally — links, not summaries. Follow up with web_fetch to read a result.

Argument Type Description
query string (required) The search query
limit integer Optional cap; default returns the full first page (~10)
engine string auto (default), duckduckgo, bing, mojeek, searxng

A search fetches a single result page (~10 results), returned in full by default so nothing at position 9/10 is dropped. There's no deep pagination — if the answer isn't in the first page, refine the query.

web_fetch

Fetch any page and return its full content as clean Markdown.

Argument Type Description
url string (required) URL to fetch (scheme optional; https assumed)
mode string article (main content only, default), full (whole body), raw (unconverted HTML)
max_chars integer Optional cap on returned characters (truncates with a note)

Workflow: web_search to find pages, then web_fetch to read them.

Search engines

Search is a pluggable, selectable registry. In auto mode NetLens tries engines in order and returns the first with results, so a rate-limit/challenge page on one falls through to the next.

Engine Notes
duckduckgo Default; html.duckduckgo.com endpoint
bing Automatic fallback
mojeek Independent index; automatic fallback
searxng Self-hosted/public SearXNG JSON API — set NETLENS_SEARXNG_URL

Pick per call with the engine argument, or set a default with NETLENS_SEARCH_ENGINE.

Configuration

Environment Variable Default Description
NETLENS_SEARCH_ENGINE auto Default search backend
NETLENS_SEARXNG_URL SearXNG base URL for engine=searxng

How it works

  • Direct fetch. Requests go straight to the target site via the system curl (better TLS/HTTP-2/compression, so it looks like a real browser), falling back to urllib. No third-party proxy or reader is involved.
  • Local conversion. HTML → Markdown happens in-process with a hand-rolled html.parser converter — headings, lists, links (relative URLs resolved), code blocks, and GFM tables with colspan/rowspan.
  • Boilerplate stripping. Ads, cookie/consent banners, nav, footers, sidebars, social/share and related/recommended widgets, and hidden elements are removed. In article mode NetLens also isolates the main content region (<main> / <article> / [role=main]).
  • Response charset is honored (from Content-Type or <meta>), so non-UTF-8 pages don't come back garbled.

Usage from the CLI

The server is also a plain script — handy for testing before a client loads it:

python -m netlens_mcp.server search "best bg3 starting class"
python -m netlens_mcp.server fetch  https://www.ign.com/wikis/baldurs-gate-3
python -m netlens_mcp.server full   https://example.com   # whole body
python -m netlens_mcp.server raw    https://example.com   # unconverted HTML

python -m netlens_mcp runs the stdio MCP server; python -m netlens_mcp.server <cmd> runs the CLI.

Development

pip install -e ".[dev]"
python -m pytest        # run the test suite
ruff check .            # lint

Requirements

  • Python 3.10+ (and the system curl, which ships with modern Windows/macOS/Linux; falls back to urllib if absent)

License

Apache License 2.0 — see LICENSE and NOTICE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

netlens_mcp-0.1.2.tar.gz (29.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

netlens_mcp-0.1.2-py3-none-any.whl (23.8 kB view details)

Uploaded Python 3

File details

Details for the file netlens_mcp-0.1.2.tar.gz.

File metadata

  • Download URL: netlens_mcp-0.1.2.tar.gz
  • Upload date:
  • Size: 29.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for netlens_mcp-0.1.2.tar.gz
Algorithm Hash digest
SHA256 dd42b3a897888ca35ee1196c9823449f613f27ddef8d7e87d07fa36021e3f966
MD5 9cb0d44d470c1f6930ffade0f0197ab0
BLAKE2b-256 48c1b8f14160225505334fa862de84ff320561ef7be8e9a282fdb1fef71caeb2

See more details on using hashes here.

File details

Details for the file netlens_mcp-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: netlens_mcp-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 23.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for netlens_mcp-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 35578bec93fb8f9b447ef15041a3b34bde443d07fa47d76af63bafcc250ef68a
MD5 0c19cf179eea083feb1ca2a2b95c64c5
BLAKE2b-256 0dc0f4f9cd3290975bc6e46c4d5fd68ad0d90f6d6b27ac98a18a3ddf73e45566

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page