Skip to main content

MCP server that gives any LLM agent ad-free Markdown web scraping and search.

Project description

ai-first-scraper-mcp

Plug Claude Desktop, Cursor, or Cline straight into an ad-free web scraper + search engine. Three tools, one line of config.

PyPI Python MCP License: MIT


What it does

Adds three tools to any MCP-compatible agent:

Tool What it does
fetch_page Fetch one URL → return clean Markdown (HTML or PDF).
fetch_pages_batch Fetch up to 25 URLs in parallel → return Markdown for each.
search_web Run a web search and return the top-k result pages already converted to Markdown.

No more "the model called curl and then tried to parse 80kB of ad HTML." Your agent receives clean Markdown ready to reason about.

Backed by the ai-first-scraper and ai-first-search APIs.


Install

Fastest — uvx (no install, runs from PyPI on demand)

// claude_desktop_config.json  /  cline_mcp_settings.json  /  ~/.cursor/mcp.json
{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"]
    }
  }
}

Restart your client (Claude Desktop / Cursor / Cline). The three tools above will appear automatically.

Alternative — pip install

pip install ai-first-scraper-mcp
{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "ai-first-scraper-mcp"
    }
  }
}

Where the config file lives

Client Config path
Claude Desktop (macOS) ~/Library/Application Support/Claude/claude_desktop_config.json
Claude Desktop (Windows) %APPDATA%\Claude\claude_desktop_config.json
Cursor ~/.cursor/mcp.json
Cline (VS Code) ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Point at your own backend (optional)

By default this server calls the public ai-first-scraper.onrender.com and ai-first-search.onrender.com instances. If you want to self-host, set env vars in your MCP config:

{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"],
      "env": {
        "SCRAPER_URL": "https://your-scraper.example.com",
        "SEARCH_URL":  "https://your-search.example.com",
        "AFS_TIMEOUT": "60"
      }
    }
  }
}

Verify it works

Open your MCP client and ask the agent:

"Use the search_web tool to find the top 3 recent articles about MCP and summarize them in 5 bullets each."

You should see the agent call search_web, get back Markdown for each result, and produce the summary without ever touching raw HTML.


Companion projects

  • ai-first-scraper — the per-URL Markdown cleaner this MCP server fans out to.
  • ai-first-search — search → scrape → markdown pipeline.
  • mcp-rec — record & replay any MCP server's traffic for tests and bug reports.
  • llm-cache-proxy — local cache for OpenAI/Anthropic API calls.
  • promptlocker — lockfile for prompts.
  • context-diff — see what blew up your Claude Code context window.
  • agentwatch — overlay for browser AI agents.

Develop locally

git clone https://github.com/yubinkim444/ai-first-scraper-mcp.git
cd ai-first-scraper-mcp

uv sync                    # or: pip install -e .
ai-first-scraper-mcp       # speaks MCP over stdio

To test against a local client, point its MCP config at the same command.


License

MIT © yubinkim444

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_first_scraper_mcp-1.0.1.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_first_scraper_mcp-1.0.1-py3-none-any.whl (6.1 kB view details)

Uploaded Python 3

File details

Details for the file ai_first_scraper_mcp-1.0.1.tar.gz.

File metadata

  • Download URL: ai_first_scraper_mcp-1.0.1.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for ai_first_scraper_mcp-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6450df1335f5a1c389faf1ad7fc8f0801970a49403b4302603511af3a54a271c
MD5 8e372b40c40448c17144d3fa9205a239
BLAKE2b-256 9ca76830b40f1f148812e3b69da20a6ea9000b9df2e9d78177f5dc0ac9b13fad

See more details on using hashes here.

File details

Details for the file ai_first_scraper_mcp-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_first_scraper_mcp-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 24ab8b345449fb337d8d4f2955bb67798138de12880faed86e8acdc97024912e
MD5 1119d5026c8bb3c5c8f3381d6acf9c32
BLAKE2b-256 d5c3604f5a256496cd8c4f3e13253ec5850f55b0246c0f101a6278544a8971ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page