Skip to main content

A robust MCP server for fetching and extracting web content using Trafilatura

Project description

FetchV2 MCP Server

PyPI version CI Python 3.10+ License: MIT

Model Context Protocol (MCP) server for web content fetching and extraction.

This MCP server provides tools to fetch webpages, extract clean content using Trafilatura, and discover links for batch processing.

Features

  • Fetch Webpages: Extract clean markdown content from any URL
  • Batch Fetching: Fetch up to 10 URLs in a single request
  • Link Discovery: Find and filter links on any webpage
  • Smart Extraction: Trafilatura removes boilerplate (navbars, ads, footers)
  • Robots.txt Compliance: Respects robots.txt with graceful timeout handling
  • Pagination Support: Handle large pages with start_index parameter

Prerequisites

  1. Install uv from Astral
  2. Install Python 3.10 or newer using uv python install 3.10

Installation

Cursor VS Code
Install MCP Server Install on VS Code

Or configure manually in your MCP client:

{
  "mcpServers": {
    "fetchv2": {
      "command": "uvx",
      "args": ["fetchv2-mcp-server@latest"],
      "disabled": false,
      "autoApprove": []
    }
  }
}

Config file locations:

  • Claude Desktop (macOS): ~/Library/Application Support/Claude/claude_desktop_config.json
  • Claude Desktop (Windows): %APPDATA%\Claude\claude_desktop_config.json
  • Windsurf: ~/.codeium/windsurf/mcp_config.json
  • Kiro: .kiro/settings/mcp.json in your project

Install from PyPI

# Using uv
uv add fetchv2-mcp-server

# Using pip
pip install fetchv2-mcp-server

Basic Usage

Example prompts to try:

  • "Fetch the documentation from <URL>"
  • "Find all links on <docs URL> that contain 'tutorial'"
  • "Read these three pages and summarize the differences: [url1, url2, url3]"

Available Tools

fetch

Fetches a webpage and extracts its main content as clean markdown.

fetch(url: str, max_length: int = 5000, start_index: int = 0) -> str
Parameter Type Default Description
url str required The webpage URL to fetch
max_length int 5000 Maximum characters to return
start_index int 0 Character offset for pagination
get_raw_html bool false Skip extraction, return raw HTML
include_metadata bool true Include title, author, date
include_tables bool true Preserve tables in markdown
include_links bool false Preserve hyperlinks
bypass_robots_txt bool false Skip robots.txt check

fetch_batch

Fetches multiple webpages in a single request.

fetch_batch(urls: list[str], max_length_per_url: int = 2000) -> str
Parameter Type Default Description
urls list[str] required List of URLs (max 10)
max_length_per_url int 2000 Character limit per URL
get_raw_html bool false Skip extraction for all URLs

discover_links

Discovers all links on a webpage with optional filtering.

discover_links(url: str, filter_pattern: str = "") -> str
Parameter Type Default Description
url str required The webpage URL to scan
filter_pattern str "" Regex to filter links (e.g., /docs/)

Workflow Example

Step 1: Discover relevant documentation pages

discover_links(url="https://docs.example.com/", filter_pattern="/guide/")

Step 2: Batch fetch the pages you need

fetch_batch(urls=["https://docs.example.com/guide/intro", "https://docs.example.com/guide/setup"])

Prompts

  • fetch_manual - User-initiated fetch that bypasses robots.txt
  • research_topic - Research a topic by fetching multiple relevant URLs

Development

# Clone and install
git clone https://github.com/praveenc/fetchv2-mcp-server.git
cd fetchv2-mcp-server
uv sync --dev
source .venv/bin/activate

# Run tests
uv run pytest

# Run with MCP Inspector
mcp dev src/fetchv2_mcp_server/server.py

# Linting and type checking
uv run ruff check .
uv run pyright

License

MIT - see LICENSE for details.

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Support

For issues and questions, use the GitHub issue tracker.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetchv2_mcp_server-1.0.1.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fetchv2_mcp_server-1.0.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file fetchv2_mcp_server-1.0.1.tar.gz.

File metadata

  • Download URL: fetchv2_mcp_server-1.0.1.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetchv2_mcp_server-1.0.1.tar.gz
Algorithm Hash digest
SHA256 0eeea9ef2f97c54c66385361f56b1dcd45fe1cd3b9ed964fa9d6096f045f8bc4
MD5 0ad717442ad6e2d680de315adf90dfab
BLAKE2b-256 74dbd56ead86dc1d27b575bec102bb3e9cd17a498df3f1e2350529dd9a73afc2

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetchv2_mcp_server-1.0.1.tar.gz:

Publisher: publish.yml on praveenc/fetchv2-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fetchv2_mcp_server-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for fetchv2_mcp_server-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c6336c2c75a2960034c11a6e11e90cb671dc630f0a2862dd16f2c2c6d9174c97
MD5 e868f642c44eaed14d9082592688f328
BLAKE2b-256 ad0e35e57aad9c68e4396b1ee0855a433aceefd44a3fb59016251d64640ce875

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetchv2_mcp_server-1.0.1-py3-none-any.whl:

Publisher: publish.yml on praveenc/fetchv2-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page