Extract article content from web platforms and return it as clean Markdown.
Project description
mdfetch
A Python library that extracts article content from web platforms and returns it as clean Markdown.
Install
pip install mdfetch
Usage
from mdfetch import extract
# Works with any supported platform — just pass the URL
markdown = extract("https://medium.com/some-publication/article-slug-abc123")
markdown = extract("https://dev.to/username/article-slug")
print(markdown)
Error handling
from mdfetch import (
extract,
InvalidURLError,
UnsupportedPlatformError,
UnsupportedContentTypeError,
FetchError,
HTTPStatusError,
EmptyContentError,
)
url = "https://medium.com/some-publication/article-slug-abc123"
try:
markdown = extract(url)
except InvalidURLError as e:
print(f"Bad URL: {e.message}")
except UnsupportedPlatformError as e:
print(f"Platform not supported: {e.message}")
except UnsupportedContentTypeError as e:
print(f"Not an article page: {e.message}")
except HTTPStatusError as e:
print(f"HTTP {e.status_code}: {e.message}")
except FetchError as e:
print(f"Network error: {e.message}")
except EmptyContentError as e:
print(f"No content: {e.message}")
Supported platforms
| Platform | Domains |
|---|---|
| Medium | medium.com, *.medium.com |
| dev.to | dev.to |
Development
Requires uv.
make setup # install dependencies
make test # run unit tests
make integration # run integration tests (requires network access)
make lint # ruff check
make format # ruff format
make build # build wheel + sdist
make upgrade-deps # upgrade all dependencies
make clean # remove build artifacts
Requirements
- Python 3.12+
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdfetch-0.2.2.tar.gz.
File metadata
- Download URL: mdfetch-0.2.2.tar.gz
- Upload date:
- Size: 274.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
48edec42d57a09854bae6873188128e81c9af27de59f9242a6e6fc2278f4c42c
|
|
| MD5 |
91b9d46c2705de8c036cff43fd85f399
|
|
| BLAKE2b-256 |
4b077b33815c4be37790635f4a0787f332b2dc988e5318b2787e12c032bebf7a
|
Provenance
The following attestation bundles were made for mdfetch-0.2.2.tar.gz:
Publisher:
publish.yml on stn1slv/md-fetch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mdfetch-0.2.2.tar.gz -
Subject digest:
48edec42d57a09854bae6873188128e81c9af27de59f9242a6e6fc2278f4c42c - Sigstore transparency entry: 1547145322
- Sigstore integration time:
-
Permalink:
stn1slv/md-fetch@2d742bc5778c036f40f023956522b30baf53730b -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/stn1slv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2d742bc5778c036f40f023956522b30baf53730b -
Trigger Event:
release
-
Statement type:
File details
Details for the file mdfetch-0.2.2-py3-none-any.whl.
File metadata
- Download URL: mdfetch-0.2.2-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e9a87c3175a4358eaf36b38805d1eb301670d300f554fa6ae6baee5be49d400
|
|
| MD5 |
330e59f2072b474a100d62c1eeeaf26c
|
|
| BLAKE2b-256 |
a35f081692dbb18f7645d723084955c136901c098ae412d1a2dc45edf731e919
|
Provenance
The following attestation bundles were made for mdfetch-0.2.2-py3-none-any.whl:
Publisher:
publish.yml on stn1slv/md-fetch
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mdfetch-0.2.2-py3-none-any.whl -
Subject digest:
4e9a87c3175a4358eaf36b38805d1eb301670d300f554fa6ae6baee5be49d400 - Sigstore transparency entry: 1547145344
- Sigstore integration time:
-
Permalink:
stn1slv/md-fetch@2d742bc5778c036f40f023956522b30baf53730b -
Branch / Tag:
refs/tags/v0.2.2 - Owner: https://github.com/stn1slv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2d742bc5778c036f40f023956522b30baf53730b -
Trigger Event:
release
-
Statement type: