Skip to main content

Fetch a web page and convert it into cleaned Markdown.

Project description

fetch-markdown

fetch_markdown is all about “HTML in → Markdown out.” You can start from a live URL, a file on disk, or an already-loaded HTML string.

It can be used from CLI or as a Python library.

Installation

pip install fetch-markdown

Prerequisites:

  • Python 3.10+ runtime
  • Node.js (recommended for best results; powers Readability.js content extraction)

CLI usage

1. Fetch a URL and display Markdown

fetch-markdown https://www.iana.org/help/example-domains

2. Fetch and write to a file

fetch-markdown --output sample-output.md https://www.iana.org/help/example-domains

3. Convert previously saved HTML (files or stdin)

# convert file
fetch-markdown sample-page.html
# or from stdin
cat sample-page.html | fetch-markdown -

4. Skip Markdown conversion and emit the HTML verbatim

fetch-markdown --raw https://example.com

Parameters

  • source: URL, filesystem path, or - to read HTML from stdin.
  • -o/--output PATH: optional destination file (stdout is the default).
  • --raw: bypass HTML-to-Markdown conversion and emit the response body.
  • --user-agent STRING: override the default identifier.
  • --ignore-robots: skip robots.txt validation (use sparingly).
  • --proxy URL: HTTP(S) proxy forwarded to httpx.
  • --timeout SECONDS: request timeout (default 30 seconds).
  • --rewrite-relative-urls/--no-rewrite-relative-urls:
    enable or disable rewriting relative href/src attributes to absolute links (default on).
  • --base-url URL: optional base URL for rewriting relative urls (default source).

Python Library usage

fetch_markdown can also be used as a Python library.

1. Fetch a URL and get Markdown

from fetch_markdown import fetch_to_markdown

markdown = fetch_to_markdown("https://www.iana.org/help/example-domains")

2. Convert a previously saved HTML file

from fetch_markdown import file_to_markdown

markdown_from_file = file_to_markdown("sample-page.html")

3. Convert an HTML string you already have

from fetch_markdown import html_to_markdown

html = "<html><body><h1>Offline HTML</h1></body></html>"
markdown_from_html = html_to_markdown(html)

# Optionally disable replacing relative links with absolute URLs
markdown_custom = html_to_markdown(
    html,
    rewrite_relative_urls=False,
)

# Or replace relative links with a custom base URL
markdown_custom = html_to_markdown(
    html,
    rewrite_relative_urls=False,
    base_url="https://example.com/docs/",
)

Additional public methods

Need to store markup or run your own converter? Use fetch and skip the Markdown step entirely:

from fetch_markdown import fetch

raw_html, content_type = fetch("https://example.com/docs")

Notes

  • The CLI and library both fetch live webpages from URLs; network availability and site rate limits apply.
  • Set the FETCH_MARKDOWN_NODE_PATH environment variable to the Node.js binary (or its directory) if Readability.js cannot find node on your PATH.
  • Inspired by the Fetch MCP Server.
  • Thanks go to these libraries for the heavy lifting:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetch_markdown-0.1.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fetch_markdown-0.1.0-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file fetch_markdown-0.1.0.tar.gz.

File metadata

  • Download URL: fetch_markdown-0.1.0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetch_markdown-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e397b9429b108cc2da76c862e0a4a152dd1ca3c24827aeb2c9420619d3e27b12
MD5 7225d781686a7a2b6b445fc2ed53b7c0
BLAKE2b-256 3521a1ca04000fdfd0ea119410a41df5b77b28b4f2bb20233e7ae4f981f789f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetch_markdown-0.1.0.tar.gz:

Publisher: ci.yml on Wuodan/fetch-markdown

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fetch_markdown-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: fetch_markdown-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fetch_markdown-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8d7e373d618315bf3599c83cf26f971f69911a78794a6a6d07de5ae716514b23
MD5 e7c9f5e0dd8a8a78bb33779eaa77dcd0
BLAKE2b-256 86b7b6739e5310c61ca4c1724254e8d0cc1ad40e94d514705ff442a52ea52fee

See more details on using hashes here.

Provenance

The following attestation bundles were made for fetch_markdown-0.1.0-py3-none-any.whl:

Publisher: ci.yml on Wuodan/fetch-markdown

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page