Skip to main content

MCP server that gives any LLM agent ad-free Markdown web scraping and search.

Project description

ai-first-scraper-mcp

Plug Claude Desktop, Cursor, or Cline straight into an ad-free web scraper + search engine. Three tools, one line of config.

PyPI Python MCP License: MIT


What it does

Adds three tools to any MCP-compatible agent:

Tool What it does
fetch_page Fetch one URL → return clean Markdown (HTML or PDF).
fetch_pages_batch Fetch up to 25 URLs in parallel → return Markdown for each.
search_web Run a web search and return the top-k result pages already converted to Markdown.

No more "the model called curl and then tried to parse 80kB of ad HTML." Your agent receives clean Markdown ready to reason about.

Backed by the ai-first-scraper and ai-first-search APIs.


Install

Fastest — uvx (no install, runs from PyPI on demand)

// claude_desktop_config.json  /  cline_mcp_settings.json  /  ~/.cursor/mcp.json
{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"]
    }
  }
}

Restart your client (Claude Desktop / Cursor / Cline). The three tools above will appear automatically.

Alternative — pip install

pip install ai-first-scraper-mcp
{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "ai-first-scraper-mcp"
    }
  }
}

Where the config file lives

Client Config path
Claude Desktop (macOS) ~/Library/Application Support/Claude/claude_desktop_config.json
Claude Desktop (Windows) %APPDATA%\Claude\claude_desktop_config.json
Cursor ~/.cursor/mcp.json
Cline (VS Code) ~/Library/Application Support/Code/User/globalStorage/saoudrizwan.claude-dev/settings/cline_mcp_settings.json

Point at your own backend (optional)

By default this server calls the public ai-first-scraper.onrender.com and ai-first-search.onrender.com instances. If you want to self-host, set env vars in your MCP config:

{
  "mcpServers": {
    "ai-first-scraper": {
      "command": "uvx",
      "args": ["ai-first-scraper-mcp"],
      "env": {
        "SCRAPER_URL": "https://your-scraper.example.com",
        "SEARCH_URL":  "https://your-search.example.com",
        "AFS_TIMEOUT": "60"
      }
    }
  }
}

Verify it works

Open your MCP client and ask the agent:

"Use the search_web tool to find the top 3 recent articles about MCP and summarize them in 5 bullets each."

You should see the agent call search_web, get back Markdown for each result, and produce the summary without ever touching raw HTML.


Companion projects


Develop locally

git clone https://github.com/yubinkim444/ai-first-scraper-mcp.git
cd ai-first-scraper-mcp

uv sync                    # or: pip install -e .
ai-first-scraper-mcp       # speaks MCP over stdio

To test against a local client, point its MCP config at the same command.


License

MIT © yubinkim444

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_first_scraper_mcp-1.0.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_first_scraper_mcp-1.0.0-py3-none-any.whl (5.9 kB view details)

Uploaded Python 3

File details

Details for the file ai_first_scraper_mcp-1.0.0.tar.gz.

File metadata

  • Download URL: ai_first_scraper_mcp-1.0.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for ai_first_scraper_mcp-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a55fff4cb30fe7ff666b470b3ae8ee79ec4f618ef412af6d5bc64a454d0aba20
MD5 02b92d9b5e701b9bd34fa5c2d30f3cbc
BLAKE2b-256 be3fa9fd1521b1337502bc66673a137102d580bc05bcf0ec443532fdec8f515a

See more details on using hashes here.

File details

Details for the file ai_first_scraper_mcp-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_first_scraper_mcp-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4028d0dd72c030c3d430efe07ab519569651dbb504968a3bbbad55b0765b48e8
MD5 1d0a7b19dd1153a6e0af3cdedef420a8
BLAKE2b-256 916858b738da2662f8c1bed49f7f50c327fc3888879e6a73bb1829e008702220

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page