Skip to main content

Fetch a web page and convert it into cleaned Markdown.

Project description

fetch-markdown

fetch-markdown is a lightweight Python tool that reuses the content extraction logic from Anthropic's mcp_server_fetch project to turn web pages into cleaned Markdown. It can be used either as a small library or through a command-line interface. Upstream code lives at https://github.com/modelcontextprotocol/servers/tree/main/src/fetch.

Installation

pip install -r requirements-dev.txt  # includes runtime deps and pytest/ruff

Library usage

from pathlib import Path
from fetch_markdown import fetch_markdown

markdown = fetch_markdown("https://huggingface.co/unsloth/GLM-4.6-GGUF")
print(markdown[:200])

output_path = Path("/tmp/model-card.md")
fetch_markdown(
    "https://huggingface.co/unsloth/GLM-4.6-GGUF",
    output_path=output_path,
)

CLI usage

python -m fetch_markdown https://huggingface.co/unsloth/GLM-4.6-GGUF

# or
fetch-markdown --output output.md https://huggingface.co/unsloth/GLM-4.6-GGUF

Parameters

The library function and CLI share the same core arguments/options:

  • url (positional for CLI / first argument for library): target page.
  • output_path / -o/--output PATH: optional destination file; stdout is used when omitted.
  • force_raw / --raw: skip simplification and emit the response body verbatim.
  • user_agent / --user-agent STRING: override the default identifier.
  • ignore_robots_txt / --ignore-robots: skip robots.txt checks (use sparingly).
  • proxy_url / --proxy URL: HTTP(S) proxy forwarded to httpx.
  • timeout / --timeout SECONDS: request timeout (default 30 seconds).

Development

  • Lint with ruff check fetch_markdown tests.
  • Run tests with pytest --cov=fetch_markdown --cov-report=term-missing.

The tests depend on the Hugging Face website being reachable. They will be skipped automatically if the network call fails.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_markdown-0.0.1.tar.gz (8.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fetch_markdown-0.0.1-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file fetch_markdown-0.0.1.tar.gz.

File metadata

  • Download URL: fetch_markdown-0.0.1.tar.gz
  • Upload date:
  • Size: 8.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetch_markdown-0.0.1.tar.gz
Algorithm Hash digest
SHA256 9e37843da02d36c711c3eebcf2e46a8026f220696b98a53a55ea01782ec30084
MD5 47ae4b3b2e9335194b0eec0cb86697dd
BLAKE2b-256 398f43f177455492e004b5e04d515d1bea9d4132aa37641f12bc81992ef779fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetch_markdown-0.0.1.tar.gz:

Publisher: ci.yml on Wuodan/fetch-markdown

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fetch_markdown-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: fetch_markdown-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetch_markdown-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8f7f01df796676761e010d286731ad2e22946c076606565906142b007d353bc6
MD5 cb7e65fd4f9143b3e3184ef04062a3cf
BLAKE2b-256 706b326404eafbc6b1da8bb77d2a950ae87fed1e29ee10b4c39835216a9abdf2

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetch_markdown-0.0.1-py3-none-any.whl:

Publisher: ci.yml on Wuodan/fetch-markdown

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page