Skip to main content

A dedicated web content fetching and conversion service based on the MCP philosophy.

Project description

huoshui-fetch

A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.

Features

Fetching Tools

  • fetch_url: Fetch content from URLs with customizable timeout, redirect handling, and user-agent
  • fetch_with_headers: Fetch URLs with custom headers for authenticated requests

Conversion Tools

  • html_to_markdown_tool: Convert HTML to clean Markdown format
  • html_to_text_tool: Extract plain text from HTML
  • clean_html_tool: Remove scripts/styles and sanitize HTML
  • json_to_markdown_tool: Convert JSON data to readable Markdown

Extraction Tools

  • extract_article_tool: Extract main article content using readability
  • extract_links_tool: Extract all links with filtering options
  • extract_metadata_tool: Extract page metadata (title, description, OG tags)
  • extract_images_tool: Extract images with size filtering
  • extract_structured_data_tool: Extract JSON-LD and microdata

Installation

# Using uv (recommended)
uv sync

# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui-fetch.git

Usage

Run with uvx (recommended for one-time use)

# From the repository
uvx --from . huoshui-fetch

# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui-fetch.git huoshui-fetch

Run directly

# Using uv
uv run python -m huoshui_fetch

# Or if installed
python -m huoshui_fetch

The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP-compatible clients.

Configuration for Claude Desktop

Add to your Claude Desktop configuration:

{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": ["--no-cache", "--from", ".", "huoshui-fetch"],
      "cwd": "/path/to/huoshui-fetch"
    }
  }
}

Or if installed from GitHub:

{
  "mcpServers": {
    "huoshui-fetch": {
      "command": "uvx",
      "args": ["--from", "git+https://github.com/yourusername/huoshui-fetch.git", "huoshui-fetch"]
    }
  }
}

Example Usage

Once configured, you can use the tools in Claude Desktop:

// Fetch a webpage
fetch_url("https://example.com")

// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")

// Extract article content
extract_article_tool(html_content, "https://example.com/article")

Requirements

  • Python 3.11+
  • Dependencies listed in pyproject.toml

DXT Extension

This project supports DXT (Desktop Extensions) format for easy distribution and installation.

To build the DXT extension:

python build_dxt.py

This will create a huoshui-fetch-{version}.dxt file that can be installed in compatible AI desktop applications.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huoshui_fetch-0.1.1.tar.gz (51.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

huoshui_fetch-0.1.1-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file huoshui_fetch-0.1.1.tar.gz.

File metadata

  • Download URL: huoshui_fetch-0.1.1.tar.gz
  • Upload date:
  • Size: 51.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for huoshui_fetch-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3b9327e8a9d5d2d5cf04ddcf385bafaef617e1506bf17270a9246bb77a08997f
MD5 5a8c0e950eb1b37c41be7bf3600c6655
BLAKE2b-256 596d8985db00f89dd14ce040688f8081d70aaaee9a83dd6f9002fb947a553f06

See more details on using hashes here.

File details

Details for the file huoshui_fetch-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for huoshui_fetch-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 77f9607a8cd19cf87530f67a2a75b6a6ad04170fe84ff83d11fbb92413509484
MD5 6ad68e2980051bf738cec84f864b0b02
BLAKE2b-256 b93df5b72a60c53828831d498aa26869655f4d671b9674e44b2a456fe09395b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page