Fetch a web page and convert it into cleaned Markdown.
Project description
fetch-markdown
fetch-markdown is a lightweight Python tool that reuses the content
extraction logic from Anthropic's mcp_server_fetch project to turn web pages
into cleaned Markdown. It can be used either as a small library or through a
command-line interface. Upstream code lives at
https://github.com/modelcontextprotocol/servers/tree/main/src/fetch.
Installation
pip install -r requirements-dev.txt # includes runtime deps and pytest/ruff
Library usage
from pathlib import Path
from fetch_markdown import fetch_markdown
markdown = fetch_markdown("https://huggingface.co/unsloth/GLM-4.6-GGUF")
print(markdown[:200])
output_path = Path("/tmp/model-card.md")
fetch_markdown(
"https://huggingface.co/unsloth/GLM-4.6-GGUF",
output_path=output_path,
)
CLI usage
python -m fetch_markdown https://huggingface.co/unsloth/GLM-4.6-GGUF
# or
fetch-markdown --output output.md https://huggingface.co/unsloth/GLM-4.6-GGUF
Parameters
The library function and CLI share the same core arguments/options:
url(positional for CLI / first argument for library): target page.output_path/-o/--output PATH: optional destination file; stdout is used when omitted.force_raw/--raw: skip simplification and emit the response body verbatim.user_agent/--user-agent STRING: override the default identifier.ignore_robots_txt/--ignore-robots: skip robots.txt checks (use sparingly).proxy_url/--proxy URL: HTTP(S) proxy forwarded to httpx.timeout/--timeout SECONDS: request timeout (default 30 seconds).
Development
- Lint with
ruff check fetch_markdown tests. - Run tests with
pytest --cov=fetch_markdown --cov-report=term-missing.
The tests depend on the Hugging Face website being reachable. They will be skipped automatically if the network call fails.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fetch_markdown-0.0.1.tar.gz.
File metadata
- Download URL: fetch_markdown-0.0.1.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e37843da02d36c711c3eebcf2e46a8026f220696b98a53a55ea01782ec30084
|
|
| MD5 |
47ae4b3b2e9335194b0eec0cb86697dd
|
|
| BLAKE2b-256 |
398f43f177455492e004b5e04d515d1bea9d4132aa37641f12bc81992ef779fc
|
Provenance
The following attestation bundles were made for fetch_markdown-0.0.1.tar.gz:
Publisher:
ci.yml on Wuodan/fetch-markdown
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fetch_markdown-0.0.1.tar.gz -
Subject digest:
9e37843da02d36c711c3eebcf2e46a8026f220696b98a53a55ea01782ec30084 - Sigstore transparency entry: 706912298
- Sigstore integration time:
-
Permalink:
Wuodan/fetch-markdown@723967ae4123baf07f3a04257812109386a363db -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/Wuodan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@723967ae4123baf07f3a04257812109386a363db -
Trigger Event:
push
-
Statement type:
File details
Details for the file fetch_markdown-0.0.1-py3-none-any.whl.
File metadata
- Download URL: fetch_markdown-0.0.1-py3-none-any.whl
- Upload date:
- Size: 6.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f7f01df796676761e010d286731ad2e22946c076606565906142b007d353bc6
|
|
| MD5 |
cb7e65fd4f9143b3e3184ef04062a3cf
|
|
| BLAKE2b-256 |
706b326404eafbc6b1da8bb77d2a950ae87fed1e29ee10b4c39835216a9abdf2
|
Provenance
The following attestation bundles were made for fetch_markdown-0.0.1-py3-none-any.whl:
Publisher:
ci.yml on Wuodan/fetch-markdown
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fetch_markdown-0.0.1-py3-none-any.whl -
Subject digest:
8f7f01df796676761e010d286731ad2e22946c076606565906142b007d353bc6 - Sigstore transparency entry: 706912304
- Sigstore integration time:
-
Permalink:
Wuodan/fetch-markdown@723967ae4123baf07f3a04257812109386a363db -
Branch / Tag:
refs/tags/0.0.1 - Owner: https://github.com/Wuodan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@723967ae4123baf07f3a04257812109386a363db -
Trigger Event:
push
-
Statement type: