A robust MCP server for fetching and extracting web content using Trafilatura

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

FetchV2 MCP Server

A robust Model Context Protocol server for fetching and extracting web content using Trafilatura. Optimized for AI agents with clean markdown output.

Why FetchV2?

Trafilatura is the real star. Unlike basic HTML-to-markdown converters, Trafilatura is specifically designed for web content extraction:

Removes boilerplate (navbars, footers, ads, cookie banners)
Preserves article structure and tables
Extracts metadata (title, author, date) automatically
Handles edge cases like minimal-content SPAs

Graceful robots.txt handling. Instead of failing hard when robots.txt is unreachable, FetchV2 treats timeout/unavailable as "allowed" - more practical for real-world use.

Features

Superior Content Extraction: Uses Trafilatura for high-quality HTML-to-markdown conversion
Robots.txt Compliance: Respects robots.txt by default, gracefully handles timeouts
Pagination Support: Handle large pages with start_index parameter
Multi-URL Fetching: Fetch up to 10 URLs in a single request
Link Discovery: Extract and filter links from any webpage
Raw Mode: Get unprocessed content when needed
Markdown Detection: Automatically handles .md files without extraction

Installation

# Clone the repo
git clone https://github.com/praveenc/fetchv2-mcp-server.git
cd fetchv2-mcp-server

# Using uv (recommended)
uv sync
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Or using pip
python -m venv .venv
source .venv/bin/activate
pip install -e .

Available Tools

`fetch`

Fetch a single webpage and extract its main content as clean markdown.

Use when: Reading an article, documentation page, or blog post.

Parameters:

url (required): The webpage URL to fetch
max_length (default: 5000): Maximum characters to return (use 1000-2000 for summaries)
start_index (default: 0): Character offset for pagination
get_raw_html (default: false): Skip extraction, return original HTML
include_metadata (default: true): Include title, author, date at top
include_tables (default: true): Preserve tables in markdown format
include_links (default: false): Preserve hyperlinks in output
bypass_robots_txt (default: false): Skip robots.txt check (user-initiated only)

`fetch_batch`

Fetch multiple webpages in a single request. Fewer round trips = faster workflows.

Use when: You have 2-10 URLs to read (e.g., from discover_links results).

Parameters:

urls (required): List of URLs (max 10)
max_length_per_url (default: 2000): Character limit per URL
get_raw_html (default: false): Skip extraction for all URLs

`discover_links`

Discover all links on a webpage. Use before fetch_batch to find relevant URLs.

Use when: Exploring a site to find relevant pages before fetching.

Parameters:

url (required): The webpage URL to scan for links
filter_pattern (optional): Regex to filter links (e.g., /docs/, \.pdf$)

Real-World Use Cases

Discovery → Batch Fetch Workflow

First, discover what pages exist:

discover_links(url="https://kiro.dev/docs/", filter_pattern="/docs/")

Tool Output:

# Links from https://kiro.dev/docs/
Found 11 links

- https://kiro.dev/docs/getting-started/installation/
- https://kiro.dev/docs/getting-started/first-project/
- https://kiro.dev/docs/specs/
- https://kiro.dev/docs/hooks/
- https://kiro.dev/docs/chat/
- https://kiro.dev/docs/steering/
- https://kiro.dev/docs/mcp/
...

Then fetch multiple pages at once:

fetch_batch(
  urls=["https://kiro.dev/docs/specs/", "https://kiro.dev/docs/hooks/", "https://kiro.dev/docs/steering/"],
  max_length_per_url=1500
)

Tool Output:

## https://kiro.dev/docs/specs/
<!-- Type: markdown (extracted) -->

Specs or specifications are structured artifacts that formalize the development
process for complex features in your application...

---

## https://kiro.dev/docs/hooks/
<!-- Type: markdown (extracted) -->

Agent hooks are powerful automation tools that streamline your development
workflow by automatically executing predefined agent actions...

---

## https://kiro.dev/docs/steering/
<!-- Type: markdown (extracted) -->

Steering gives Kiro persistent knowledge about your workspace through markdown
files. Instead of explaining your conventions in every chat...

Use Case Examples

discover_links:

Docs crawling - Find all pages before scraping
Competitive research - Extract blog post links from a site
API discovery - Find all API endpoint documentation pages

fetch_batch:

Comparison research - Fetch React, Vue, and Svelte docs to compare approaches
Onboarding context - Grab multiple docs pages to understand a new tool
Multi-source fact-checking - Get the same topic from different sources

Key value: fewer round trips. Instead of 10 separate fetch calls (10 tool invocations, 10 approvals in supervised mode), you get everything in 1-2 calls.

Configuration

Kiro / VS Code

Add to .kiro/settings/mcp.json:

{
  "mcpServers": {
    "fetchv2": {
      "command": "uv",
      "args": ["--directory", "/path/to/fetchv2-mcp-server", "run", "python", "-m", "fetchv2_mcp_server"]
    }
  }
}

Claude Desktop

{
  "mcpServers": {
    "fetchv2": {
      "command": "uv",
      "args": ["--directory", "/path/to/fetchv2-mcp-server", "run", "python", "-m", "fetchv2_mcp_server"]
    }
  }
}

Prompts

fetch_manual - User-initiated fetch that bypasses robots.txt
research_topic - Research a topic by fetching multiple relevant URLs

Development

# Install dev dependencies
uv sync --dev

# Run with MCP Inspector
mcp dev server.py

# Type checking
uv run pyright

# Linting
uv run ruff check .

Project Structure

fetchv2_mcp_server/
├── pyproject.toml
├── README.md
└── src/
    └── fetchv2_mcp_server/
        ├── __init__.py
        ├── __main__.py
        └── server.py

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

praveenc

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.0

Dec 6, 2025

1.0.1

Dec 5, 2025

This version

1.0.0

Dec 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetchv2_mcp_server-1.0.0.tar.gz (14.2 kB view details)

Uploaded Dec 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fetchv2_mcp_server-1.0.0-py3-none-any.whl (12.3 kB view details)

Uploaded Dec 5, 2025 Python 3

File details

Details for the file fetchv2_mcp_server-1.0.0.tar.gz.

File metadata

Download URL: fetchv2_mcp_server-1.0.0.tar.gz
Upload date: Dec 5, 2025
Size: 14.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetchv2_mcp_server-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2dc65bd19210ecb4a98709357fdd8520c763837af76e24cb5b77b5af33f8f050`
MD5	`21cd54a4f0f0a9fdb38b05acc3798747`
BLAKE2b-256	`eb2e447b383c399e2167749ed7b574b5a71db073ceb11559d832593bf4232177`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetchv2_mcp_server-1.0.0.tar.gz:

Publisher: publish.yml on praveenc/fetchv2-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fetchv2_mcp_server-1.0.0.tar.gz
- Subject digest: 2dc65bd19210ecb4a98709357fdd8520c763837af76e24cb5b77b5af33f8f050
- Sigstore transparency entry: 742322078
- Sigstore integration time: Dec 5, 2025
Source repository:
- Permalink: praveenc/fetchv2-mcp-server@f61f4f5515032af84fa442f63127ab74cb5eac78
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/praveenc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f61f4f5515032af84fa442f63127ab74cb5eac78
- Trigger Event: release

File details

Details for the file fetchv2_mcp_server-1.0.0-py3-none-any.whl.

File metadata

Download URL: fetchv2_mcp_server-1.0.0-py3-none-any.whl
Upload date: Dec 5, 2025
Size: 12.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetchv2_mcp_server-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`89c63353e115fcd572d8c28e9e03f2b0f4a7e1da7eb8c3c48a71dc12bc1449ba`
MD5	`1e1264623da3c4d9ef6c56265fc394e3`
BLAKE2b-256	`e6e7b588f72aaf54e3f1f068710171d6e2962e2706847ca0d8813acbb94d558c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetchv2_mcp_server-1.0.0-py3-none-any.whl:

Publisher: publish.yml on praveenc/fetchv2-mcp-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fetchv2_mcp_server-1.0.0-py3-none-any.whl
- Subject digest: 89c63353e115fcd572d8c28e9e03f2b0f4a7e1da7eb8c3c48a71dc12bc1449ba
- Sigstore transparency entry: 742322082
- Sigstore integration time: Dec 5, 2025
Source repository:
- Permalink: praveenc/fetchv2-mcp-server@f61f4f5515032af84fa442f63127ab74cb5eac78
- Branch / Tag: refs/tags/v1.0.0
- Owner: https://github.com/praveenc
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@f61f4f5515032af84fa442f63127ab74cb5eac78
- Trigger Event: release

fetchv2-mcp-server 1.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

FetchV2 MCP Server

Why FetchV2?

Features

Installation

Available Tools

fetch

fetch_batch

discover_links

Real-World Use Cases

Discovery → Batch Fetch Workflow

Use Case Examples

Configuration

Kiro / VS Code

Claude Desktop

Prompts

Development

Project Structure

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`fetch`

`fetch_batch`

`discover_links`