A CLI tool to convert HTTP content to Markdown

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

http2md

A CLI tool to fetch web pages and convert them to Markdown using Playwright.

Installation

pip install http2md
http2md install

Usage

# Basic usage (converts to Markdown)
http2md https://example.com

# Basic usage out to file (converts to Markdown)
http2md https://example.com -o output.md

# Output raw HTML
http2md https://example.com --html

# Wait for a specific element before extracting
http2md https://spa-site.com --wait-for ".content"

# Increase timeout for slow sites (default: 30000ms)
http2md https://slow-site.com --timeout 60000

# Use specific wait strategy
http2md https://fast-site.com --wait-until load

CLI Options

usage: http2md [-h] [--html]
               [--wait-until {auto,load,domcontentloaded,networkidle,commit}]
               [--timeout TIMEOUT] [--wait-for WAIT_FOR] [-o OUT]
               [url]

Convert HTTP content to Markdown. Supports:
- Headings, lists, code blocks, tables
- Links (static and dynamic)
- Images (with alt text)
- Formatting (bold, italic, **strikethrough**)

positional arguments:
  url                   URL to process

options:
  -h, --help            show this help message and exit
  --html                Output raw HTML instead of Markdown
  --wait-until          Wait strategy (default: auto)
  --timeout TIMEOUT     Timeout in milliseconds (default: 30000)
  --wait-for WAIT_FOR   CSS selector to wait for before extracting content
  -o, --out OUT         Output file path

Wait Strategies

Strategy	Description
`auto`	Combined: tries `networkidle`, falls back on timeout (default)
`load`	Wait for `load` event
`domcontentloaded`	Wait for DOM to be ready
`networkidle`	Wait for no network activity (500ms)
`commit`	Return immediately after response headers

Python API

You can also use http2md directly from Python:

from http2md.crawler import fetch_html
from markdownify import markdownify as md

# Fetch raw HTML
html = fetch_html("https://example.com")

# Convert to Markdown
markdown = md(html)
print(markdown)

# With options
html = fetch_html(
    "https://spa-site.com",
    wait_until="networkidle",  # or "auto", "load", "domcontentloaded"
    timeout=60000,             # 60 seconds
    wait_for=".content"        # CSS selector to wait for
)

Site Crawling

Crawl entire websites to a specified depth:

# Crawl site to depth 2, save to ./docs/
http2md https://docs.example.com --depth 2 --outdir ./docs

# Only crawl /api/* pages
http2md https://docs.example.com --depth 3 --include "/api/*"

# Exclude images and static files
http2md https://site.com --depth 2 --exclude "*.png" --exclude "*.css"

# Quiet mode (no progress output)
http2md https://site.com --depth 1 --outdir ./out -q

Crawling Options

Option	Description
`--depth N`	Crawl depth (0=single page, 1=links from page, etc.)
`--outdir DIR`	Output directory for crawled pages
`--include PATTERN`	Include URLs matching glob pattern (repeatable)
`--exclude PATTERN`	Exclude URLs matching glob pattern (repeatable)
`--no-same-domain`	Allow following links to other domains
`--tqdm`	Use tqdm progress bar
`-q, --quiet`	Suppress progress output

Advanced Link Extraction

http2md automatically handles Single Page Applications (SPAs) and dynamic content:

JavaScript Execution: It executes JavaScript to render the page fully.
Auto-Scrolling: It automatically attempts to scroll to the bottom of the page to trigger lazy-loading of content.
Dynamic Links: It extracts links from the rendered DOM (using page.evaluate), not just the static HTML. This ensures links generated by JavaScript are found.

Note: Sites using non-standard navigation (e.g., onclick on div elements instead of <a> tags) may still have limited crawlability.

Python API for Crawling

from http2md.crawler_site import crawl_site

def on_progress(url, status, current, total, html=None, markdown=None):
    print(f"[{current}/{total}] {status}: {url}")
    if html:
        print(f"  Downloaded {len(html)} bytes")

results = crawl_site(
    "https://docs.example.com",
    depth=2,
    outdir="./output",
    callback=on_progress,
    include=["*/api/*"],
    exclude=["*.png"]
)

Using tqdm for Progress

from http2md.crawler_site import crawl_site
from tqdm import tqdm

pbar = tqdm(unit="pages")

def tqdm_callback(url, status, current, total, html=None, markdown=None):
    pbar.total = total
    if status == "fetching":
        pbar.set_description(f"Fetching {url[:50]}")
    elif status == "done" or status.startswith("skipped"):
        pbar.update(1)
    pbar.refresh()

crawl_site(
    "https://docs.example.com",
    depth=2,
    callback=tqdm_callback
)
pbar.close()

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.9.1

Feb 1, 2026

0.9

Jan 31, 2026

This version

0.0.2

Jan 31, 2026

0.0.1

Jan 31, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

http2md-0.0.2.tar.gz (10.6 kB view details)

Uploaded Jan 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

http2md-0.0.2-py3-none-any.whl (10.1 kB view details)

Uploaded Jan 31, 2026 Python 3

File details

Details for the file http2md-0.0.2.tar.gz.

File metadata

Download URL: http2md-0.0.2.tar.gz
Upload date: Jan 31, 2026
Size: 10.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for http2md-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`1a0bf3b40bc5df138c1cf08eb6e9ae65e00791a2f706f9c8597a7fb0e28736c0`
MD5	`78a58b15275467ccd5225d058c2244b8`
BLAKE2b-256	`0f6068dfa446776c027a96367b5389cc5de285f61eecaf587cbb65bf1934365a`

See more details on using hashes here.

File details

Details for the file http2md-0.0.2-py3-none-any.whl.

File metadata

Download URL: http2md-0.0.2-py3-none-any.whl
Upload date: Jan 31, 2026
Size: 10.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for http2md-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d5f3a97f92d9f415ac91e19bea35aa873860b0577ea19654c9e3ddb4df0aabdf`
MD5	`140af0e3fccc76c0de3f0a6d61c851cb`
BLAKE2b-256	`45be15b878feaecd192018f54cf82fb2a2af83392f6af6a4e2ad7a3100fa37b6`

See more details on using hashes here.

http2md 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

http2md

Installation

Usage

CLI Options

Wait Strategies

Python API

Site Crawling

Crawling Options

Advanced Link Extraction

Python API for Crawling

Using tqdm for Progress

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes