Fetch web pages and convert them to markdown

Project description

markdfetch

A lightweight Python library for fetching web pages and extracting content as Markdown, plain text, or structured links.

Features

Fetch web pages with a simple API
Convert HTML to Markdown
Extract plain text from web pages
Extract links with URL and anchor text
Exclude unwanted HTML tags before processing
Include only specific HTML tags before processing
Support for custom request headers and timeouts
Automatic resolution of relative URLs
CSS selector support
Optional link deduplication
Automatic retry handling

Installation

pip install markdfetch

Quick Start

import markdfetch

page = markdfetch.fetch("https://example.com")

print(page.markdown())

Fetch a Page

import markdfetch

page = markdfetch.fetch("https://example.com")

print(page.status_code)
print(page.url)

Convert HTML to Markdown

page = markdfetch.fetch("https://example.com")

markdown = page.markdown()

print(markdown)

Exclude HTML Tags

Remove unwanted sections before converting to Markdown.

page = markdfetch.fetch("https://example.com")

markdown = page.markdown(
    exclude=["nav", "footer"]
)

print(markdown)

Include Specific HTML Tags

Extract content only from selected tags.

page = markdfetch.fetch("https://example.com")

markdown = page.markdown(
    include=["article"]
)

print(markdown)

Combine Include and Exclude

page = markdfetch.fetch("https://example.com")

markdown = page.markdown(
    include=["article"],
    exclude=["nav", "footer"]
)

print(markdown)

Extract Plain Text

page = markdfetch.fetch("https://example.com")

text = page.text()

print(text)

Extract Links

page = markdfetch.fetch("https://example.com")

links = page.links()

print(links)

Example output:

[
    {
        "url": "https://example.com/about",
        "text": "About Us"
    },
    {
        "url": "https://example.com/contact",
        "text": "Contact"
    }
]

Skip Empty Links

page = markdfetch.fetch("https://example.com")

links = page.links(skip_empty=True)

Extract Content Using CSS Selectors

Target specific elements using CSS selectors.

page = markdfetch.fetch("https://example.com")

markdown = page.markdown(
    selector="article"
)

print(markdown)

You can use any valid CSS selector:

page.markdown(selector=".content")
page.markdown(selector="#main")
page.markdown(selector="article.post")

Extract Text Using CSS Selectors

Extract plain text from specific sections of a page.

page = markdfetch.fetch("https://example.com")

text = page.text(
    selector=".content"
)

print(text)

Extract Unique Links

Remove duplicate URLs from the extracted links.

page = markdfetch.fetch("https://example.com")

links = page.links(
    unique=True
)

print(links)

Roadmap

Planned features:

Async support via httpx
Proxy support
Metadata extraction

License

MIT License

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jun 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

markdfetch-0.1.0.tar.gz (4.0 kB view details)

Uploaded Jun 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

markdfetch-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Jun 14, 2026 Python 3

File details

Details for the file markdfetch-0.1.0.tar.gz.

File metadata

Download URL: markdfetch-0.1.0.tar.gz
Upload date: Jun 14, 2026
Size: 4.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for markdfetch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cdc84afe23d55973656d266fb6dfca591eac49c64f912151d18e3e60f07225c8`
MD5	`9a888b964e151bbd72870bddd6f0c8d6`
BLAKE2b-256	`f651aa0e05fbf4ecbf41bf337756e01fa276f66d39a709e45df57ede89b903e5`

See more details on using hashes here.

File details

Details for the file markdfetch-0.1.0-py3-none-any.whl.

File metadata

Download URL: markdfetch-0.1.0-py3-none-any.whl
Upload date: Jun 14, 2026
Size: 4.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for markdfetch-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`959f003027bb7a41205679724899cedbd296696c490d962aef00800a2b453f22`
MD5	`6080fe55acda81599c0dcb406575d5ba`
BLAKE2b-256	`ad13ae1a02d60cbf0311cc99b7c1a91bc8341040a521681924ecca5f387909bd`

See more details on using hashes here.

markdfetch 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

markdfetch

Features

Installation

Quick Start

Fetch a Page

Convert HTML to Markdown

Exclude HTML Tags

Include Specific HTML Tags

Combine Include and Exclude

Extract Plain Text

Extract Links

Skip Empty Links

Extract Content Using CSS Selectors

Extract Text Using CSS Selectors

Extract Unique Links

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes