Skip to main content

Extract article content from web platforms and return it as clean Markdown.

Project description

mdfetch

A Python library that extracts article content from web platforms and returns it as clean Markdown.

Install

pip install mdfetch

CLI Usage

You can use the built-in md-fetch command directly from your terminal:

# Fetch and print Markdown to standard output
md-fetch https://medium.com/example/article

# Fetch and save Markdown to a file
md-fetch https://dev.to/example/article --output article.md

Python Usage

from mdfetch import extract

# Works with any supported platform — just pass the URL
markdown = extract("https://medium.com/some-publication/article-slug-abc123")
markdown = extract("https://dev.to/username/article-slug")
markdown = extract("https://example.substack.com/p/article-slug")
markdown = extract("https://thenewstack.io/article-slug")
markdown = extract("https://dzone.com/articles/article-slug")
print(markdown)

Error handling

from mdfetch import (
    extract,
    InvalidURLError,
    UnsupportedPlatformError,
    UnsupportedContentTypeError,
    FetchError,
    HTTPStatusError,
    EmptyContentError,
)

url = "https://medium.com/some-publication/article-slug-abc123"

try:
    markdown = extract(url)
except InvalidURLError as e:
    print(f"Bad URL: {e.message}")
except UnsupportedPlatformError as e:
    print(f"Platform not supported: {e.message}")
except UnsupportedContentTypeError as e:
    print(f"Not an article page: {e.message}")
except HTTPStatusError as e:
    print(f"HTTP {e.status_code}: {e.message}")
except FetchError as e:
    print(f"Network error: {e.message}")
except EmptyContentError as e:
    print(f"No content: {e.message}")

Supported platforms

Platform Domains
Medium medium.com, *.medium.com
dev.to dev.to
Substack substack.com, *.substack.com
The New Stack thenewstack.io
DZone dzone.com

Development

Requires uv.

make setup        # install dependencies
make test         # run unit tests
make integration  # run integration tests (requires network access)
make lint         # ruff check
make format       # ruff format
make build        # build wheel + sdist
make upgrade-deps # upgrade all dependencies
make clean        # remove build artifacts

Requirements

  • Python 3.12+

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdfetch-0.5.0.tar.gz (354.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdfetch-0.5.0-py3-none-any.whl (15.9 kB view details)

Uploaded Python 3

File details

Details for the file mdfetch-0.5.0.tar.gz.

File metadata

  • Download URL: mdfetch-0.5.0.tar.gz
  • Upload date:
  • Size: 354.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mdfetch-0.5.0.tar.gz
Algorithm Hash digest
SHA256 424d1321c2b4e2da1b8ac6048e968b78fa37e139a4ce6604b0fb9de53f21c496
MD5 1909d20c8ba87dedfc13c902f975434a
BLAKE2b-256 c6b8172c76cb7f3cdea72525b66f7cf5800136684f9fa78fb8414dbd1730341e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mdfetch-0.5.0.tar.gz:

Publisher: publish.yml on stn1slv/md-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mdfetch-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: mdfetch-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 15.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mdfetch-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8710d3423b2dec6cc48339135f2d0c9b3722fc8f14a91cd36bc3b6d804432721
MD5 86f5f449da241fe2b6bdbb4cf5e5a5e5
BLAKE2b-256 5a87d688e1e85da35003751402402b6227ae89d1266b1ef687a0822f38b78779

See more details on using hashes here.

Provenance

The following attestation bundles were made for mdfetch-0.5.0-py3-none-any.whl:

Publisher: publish.yml on stn1slv/md-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page