A dedicated web content fetching and conversion service based on the MCP philosophy.
Project description
huoshui-fetch
A dedicated web content fetching and conversion MCP (Model Context Protocol) server that provides tools for fetching, converting, and extracting data from web pages.
Features
Fetching Tools
- fetch_url: Fetch content from URLs with customizable timeout, redirect handling, and user-agent
- fetch_with_headers: Fetch URLs with custom headers for authenticated requests
Conversion Tools
- html_to_markdown_tool: Convert HTML to clean Markdown format
- html_to_text_tool: Extract plain text from HTML
- clean_html_tool: Remove scripts/styles and sanitize HTML
- json_to_markdown_tool: Convert JSON data to readable Markdown
Extraction Tools
- extract_article_tool: Extract main article content using readability
- extract_links_tool: Extract all links with filtering options
- extract_metadata_tool: Extract page metadata (title, description, OG tags)
- extract_images_tool: Extract images with size filtering
- extract_structured_data_tool: Extract JSON-LD and microdata
Installation
# Using uv (recommended)
uv sync
# Or install from GitHub
pip install git+https://github.com/yourusername/huoshui-fetch.git
Usage
Run with uvx (recommended for one-time use)
# From the repository
uvx --from . huoshui-fetch
# From GitHub (once published)
uvx --from git+https://github.com/yourusername/huoshui-fetch.git huoshui-fetch
Run directly
# Using uv
uv run python -m huoshui_fetch
# Or if installed
python -m huoshui_fetch
The server communicates via standard input/output, making it perfect for integration with Claude Desktop and other MCP-compatible clients.
Configuration for Claude Desktop
Add to your Claude Desktop configuration:
{
"mcpServers": {
"huoshui-fetch": {
"command": "uvx",
"args": ["--no-cache", "--from", ".", "huoshui-fetch"],
"cwd": "/path/to/huoshui-fetch"
}
}
}
Or if installed from GitHub:
{
"mcpServers": {
"huoshui-fetch": {
"command": "uvx",
"args": ["--from", "git+https://github.com/yourusername/huoshui-fetch.git", "huoshui-fetch"]
}
}
}
Example Usage
Once configured, you can use the tools in Claude Desktop:
// Fetch a webpage
fetch_url("https://example.com")
// Convert HTML to Markdown
html_to_markdown_tool("<h1>Hello</h1><p>World</p>")
// Extract article content
extract_article_tool(html_content, "https://example.com/article")
Requirements
- Python 3.11+
- Dependencies listed in pyproject.toml
DXT Extension
This project supports DXT (Desktop Extensions) format for easy distribution and installation.
To build the DXT extension:
python build_dxt.py
This will create a huoshui-fetch-{version}.dxt file that can be installed in compatible AI desktop applications.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file huoshui_fetch-0.1.1.tar.gz.
File metadata
- Download URL: huoshui_fetch-0.1.1.tar.gz
- Upload date:
- Size: 51.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b9327e8a9d5d2d5cf04ddcf385bafaef617e1506bf17270a9246bb77a08997f
|
|
| MD5 |
5a8c0e950eb1b37c41be7bf3600c6655
|
|
| BLAKE2b-256 |
596d8985db00f89dd14ce040688f8081d70aaaee9a83dd6f9002fb947a553f06
|
File details
Details for the file huoshui_fetch-0.1.1-py3-none-any.whl.
File metadata
- Download URL: huoshui_fetch-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
77f9607a8cd19cf87530f67a2a75b6a6ad04170fe84ff83d11fbb92413509484
|
|
| MD5 |
6ad68e2980051bf738cec84f864b0b02
|
|
| BLAKE2b-256 |
b93df5b72a60c53828831d498aa26869655f4d671b9674e44b2a456fe09395b4
|