Fetch a web page and convert it into cleaned Markdown.
Project description
fetch-markdown
fetch_markdown is all about “HTML in → Markdown out.” You can start from a live
URL, a file on disk, or an already-loaded HTML string.
It can be used from CLI or as a Python library.
Installation
pip install fetch-markdown
Prerequisites:
- Python 3.10+ runtime
- Node.js (recommended for best results; powers Readability.js content extraction)
CLI usage
1. Fetch a URL and display Markdown
fetch-markdown https://www.iana.org/help/example-domains
2. Fetch and write to a file
fetch-markdown --output sample-output.md https://www.iana.org/help/example-domains
3. Convert previously saved HTML (files or stdin)
# convert file
fetch-markdown sample-page.html
# or from stdin
cat sample-page.html | fetch-markdown -
4. Skip Markdown conversion and emit the HTML verbatim
fetch-markdown --raw https://example.com
Parameters
source: URL, filesystem path, or-to read HTML from stdin.-o/--output PATH: optional destination file (stdout is the default).--raw: bypass HTML-to-Markdown conversion and emit the response body.--user-agent STRING: override the default identifier.--ignore-robots: skip robots.txt validation (use sparingly).--proxy URL: HTTP(S) proxy forwarded to httpx.--timeout SECONDS: request timeout (default 30 seconds).--rewrite-relative-urls/--no-rewrite-relative-urls:
enable or disable rewriting relativehref/srcattributes to absolute links (default on).--base-url URL: optional base URL for rewriting relative urls (defaultsource).
Python Library usage
fetch_markdown can also be used as a Python library.
1. Fetch a URL and get Markdown
from fetch_markdown import fetch_to_markdown
markdown = fetch_to_markdown("https://www.iana.org/help/example-domains")
2. Convert a previously saved HTML file
from fetch_markdown import file_to_markdown
markdown_from_file = file_to_markdown("sample-page.html")
3. Convert an HTML string you already have
from fetch_markdown import html_to_markdown
html = "<html><body><h1>Offline HTML</h1></body></html>"
markdown_from_html = html_to_markdown(html)
# Optionally disable replacing relative links with absolute URLs
markdown_custom = html_to_markdown(
html,
rewrite_relative_urls=False,
)
# Or replace relative links with a custom base URL
markdown_custom = html_to_markdown(
html,
rewrite_relative_urls=False,
base_url="https://example.com/docs/",
)
Additional public methods
Need to store markup or run your own converter? Use fetch and skip the Markdown
step entirely:
from fetch_markdown import fetch
raw_html, content_type = fetch("https://example.com/docs")
Notes
- The CLI and library both fetch live webpages from URLs; network availability and site rate limits apply.
- Set the
FETCH_MARKDOWN_NODE_PATHenvironment variable to the Node.js binary (or its directory) if Readability.js cannot findnodeon yourPATH. - Inspired by the Fetch MCP Server.
- Thanks go to these libraries for the heavy lifting:
- ReadabiliPy with Mozilla's Readability.js Node.js package
- Markdownify
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fetch_markdown-0.1.0.tar.gz.
File metadata
- Download URL: fetch_markdown-0.1.0.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e397b9429b108cc2da76c862e0a4a152dd1ca3c24827aeb2c9420619d3e27b12
|
|
| MD5 |
7225d781686a7a2b6b445fc2ed53b7c0
|
|
| BLAKE2b-256 |
3521a1ca04000fdfd0ea119410a41df5b77b28b4f2bb20233e7ae4f981f789f0
|
Provenance
The following attestation bundles were made for fetch_markdown-0.1.0.tar.gz:
Publisher:
ci.yml on Wuodan/fetch-markdown
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fetch_markdown-0.1.0.tar.gz -
Subject digest:
e397b9429b108cc2da76c862e0a4a152dd1ca3c24827aeb2c9420619d3e27b12 - Sigstore transparency entry: 707874214
- Sigstore integration time:
-
Permalink:
Wuodan/fetch-markdown@daf0b8132edb93da8fedf65b6be7d05b0196e732 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/Wuodan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@daf0b8132edb93da8fedf65b6be7d05b0196e732 -
Trigger Event:
push
-
Statement type:
File details
Details for the file fetch_markdown-0.1.0-py3-none-any.whl.
File metadata
- Download URL: fetch_markdown-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d7e373d618315bf3599c83cf26f971f69911a78794a6a6d07de5ae716514b23
|
|
| MD5 |
e7c9f5e0dd8a8a78bb33779eaa77dcd0
|
|
| BLAKE2b-256 |
86b7b6739e5310c61ca4c1724254e8d0cc1ad40e94d514705ff442a52ea52fee
|
Provenance
The following attestation bundles were made for fetch_markdown-0.1.0-py3-none-any.whl:
Publisher:
ci.yml on Wuodan/fetch-markdown
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fetch_markdown-0.1.0-py3-none-any.whl -
Subject digest:
8d7e373d618315bf3599c83cf26f971f69911a78794a6a6d07de5ae716514b23 - Sigstore transparency entry: 707874219
- Sigstore integration time:
-
Permalink:
Wuodan/fetch-markdown@daf0b8132edb93da8fedf65b6be7d05b0196e732 -
Branch / Tag:
refs/tags/0.1.0 - Owner: https://github.com/Wuodan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@daf0b8132edb93da8fedf65b6be7d05b0196e732 -
Trigger Event:
push
-
Statement type: