HTML to Markdown converter with Requests or Playwright backend

These details have not been verified by PyPI

Project links

Project description

pg2md — Page to Markdown

HTML to Markdown converter with Requests or Playwright backend.

Convert any webpage to clean Markdown. Choose between fast requests or full browser playwright for JavaScript-rendered pages.

Features

Two backends: Pg2MdRequests (fast) or Pg2MdPlaywright (JS support)
Browser reuse: Playwright instances share a single browser
Proxy support: HTTP/HTTPS proxies with authentication
Custom headers & cookies: Full control over requests
Clean output: Optional removal of images and links
Context manager: Auto-cleanup with with statement

Installation

pip install pg2md

# For Playwright backend:
pip install pg2md[playwright]
playwright install chromium

Quick Start

from pg2md import Pg2MdRequests, Pg2MdPlaywright

# Simple usage with Requests
pg = Pg2MdRequests()
markdown = pg.run("https://example.com")
print(markdown)

# Playwright for JS-heavy sites
pg = Pg2MdPlaywright()
markdown = pg.run("https://spa-example.com")
pg.close()

Usage

Basic Conversion

from pg2md import Pg2MdRequests

pg = Pg2MdRequests(with_image=False, with_link=False)
md = pg.run("https://news.ycombinator.com")

With Proxy

from pg2md import Pg2MdRequests, Pg2MdPlaywright

# Format: http://user:password@host:port
# Or: host:port:user:password
proxy = "http://user:pass@proxy.example.com:8080"

# Requests
pg = Pg2MdRequests()
md = pg.run("https://example.com", proxy=proxy)

# Playwright
pg = Pg2MdPlaywright()
md = pg.run("https://example.com", proxy=proxy)
pg.close()

Custom Headers & User-Agent

from pg2md import Pg2MdRequests

pg = Pg2MdRequests()
md = pg.run(
    "https://api.example.com/data",
    headers={
        "X-API-Key": "secret123",
        "Accept": "application/json",
    },
    user_agent="MyBot/1.0",
)

With Cookies

from pg2md import Pg2MdRequests

pg = Pg2MdRequests()
md = pg.run(
    "https://example.com/dashboard",
    cookies={
        "session": "abc123",
        "auth_token": "xyz789",
    },
)

Save to File

from pg2md import Pg2MdRequests

pg = Pg2MdRequests()
pg.save("output.md", "https://example.com")

# With options
pg.save(
    "article.md",
    "https://blog.example.com/post",
    proxy="http://user:pass@host:port",
    user_agent="MyBot/1.0",
)

Context Manager

from pg2md import Pg2MdPlaywright

with Pg2MdPlaywright() as pg:
    md1 = pg.run("https://site1.com")
    md2 = pg.run("https://site2.com")
    # Browser closed automatically

Multiple Instances

from pg2md import Pg2MdPlaywright

# Both share the same browser (efficient)
pg1 = Pg2MdPlaywright()
pg2 = Pg2MdPlaywright()

md1 = pg1.run("https://site1.com")
md2 = pg2.run("https://site2.com")

Pg2MdPlaywright.close_all()  # Close shared browser

API Reference

Pg2MdRequests

Pg2MdRequests(with_image=False, with_link=False)

Parameter	Type	Default	Description
`with_image`	bool	False	Include images in output
`with_link`	bool	False	Include links in output

Pg2MdPlaywright

Pg2MdPlaywright(
    browser=None,       # Custom Browser instance
    headless=True,      # Headless mode
    with_image=False,
    with_link=False,
)

Methods

`run(url, proxy=None, headers=None, cookies=None, user_agent=None, timeout=30)`

Fetch URL and convert to Markdown.

Returns: str (Markdown)

`fetch(url, proxy=None, headers=None, cookies=None, user_agent=None, timeout=30)`

Fetch HTML only.

Returns: str (HTML)

`convert(html)`

Convert HTML to Markdown.

Returns: str (Markdown)

`save(filepath, url, **kwargs)`

Fetch, convert, and save to file.

`close()`

Close browser (Playwright only).

`close_all()` (classmethod, Playwright only)

Close all shared browsers.

When to Use Which Backend?

Use Requests	Use Playwright
Static HTML pages	SPA / JavaScript apps
Speed matters	Need rendered content
Simple scraping	Bypass anti-bot (sometimes)
Low memory	Modern web apps

Examples

Scrape Multiple URLs

from pg2md import Pg2MdRequests

urls = [
    "https://blog.example.com/post1",
    "https://blog.example.com/post2",
    "https://blog.example.com/post3",
]

pg = Pg2MdRequests(with_image=False, with_link=False)

for i, url in enumerate(urls):
    pg.save(f"post_{i+1}.md", url)
    print(f"Saved: {url}")

Batch with Proxies

from pg2md import Pg2MdRequests

urls = ["https://site1.com", "https://site2.com", "https://site3.com"]
proxies = [
    "http://user1:pass1@proxy1:8080",
    "http://user2:pass2@proxy2:8080",
]

pg = Pg2MdRequests()

for i, url in enumerate(urls):
    proxy = proxies[i % len(proxies)]
    md = pg.run(url, proxy=proxy)
    print(f"[{i+1}] {len(md)} chars")

Extract Article Content

from pg2md import Pg2MdPlaywright

with Pg2MdPlaywright() as pg:
    md = pg.run(
        "https://medium.com/some-article",
        user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
    )
    
    # Save clean text
    with open("article.md", "w") as f:
        f.write(md)

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.2

Mar 16, 2026

1.0.1

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg2md-1.0.2.tar.gz (8.1 kB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pg2md-1.0.2-py3-none-any.whl (8.0 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file pg2md-1.0.2.tar.gz.

File metadata

Download URL: pg2md-1.0.2.tar.gz
Upload date: Mar 16, 2026
Size: 8.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pg2md-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`71e945ce574acfd9f8dd9efc466e34e0019261511f5a27a4496c9619f530cdc0`
MD5	`b729c459c409858d3424a6bdf46e3afb`
BLAKE2b-256	`b4879e2945b15979ec90d5527b6393b2eca851ac59d33e6b7ad6498b27cc9bdf`

See more details on using hashes here.

File details

Details for the file pg2md-1.0.2-py3-none-any.whl.

File metadata

Download URL: pg2md-1.0.2-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 8.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pg2md-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`72a84d2b0819940586462051ccd85d08d7caffcea06fe55bb89e71d961ea42b5`
MD5	`9da6a3d42d12b9db31b6bd4f87a7db99`
BLAKE2b-256	`301cddb49279aa2cd4f0164c9790398917f2334dac2be3569048382de4410364`

See more details on using hashes here.

pg2md 1.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pg2md — Page to Markdown

Features

Installation

Quick Start

Usage

Basic Conversion

With Proxy

Custom Headers & User-Agent

With Cookies

Save to File

Context Manager

Multiple Instances

API Reference

Pg2MdRequests

Pg2MdPlaywright

Methods

run(url, proxy=None, headers=None, cookies=None, user_agent=None, timeout=30)

fetch(url, proxy=None, headers=None, cookies=None, user_agent=None, timeout=30)

convert(html)

save(filepath, url, **kwargs)

close()

close_all() (classmethod, Playwright only)

When to Use Which Backend?

Examples

Scrape Multiple URLs

Batch with Proxies

Extract Article Content

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`run(url, proxy=None, headers=None, cookies=None, user_agent=None, timeout=30)`

`fetch(url, proxy=None, headers=None, cookies=None, user_agent=None, timeout=30)`

`convert(html)`

`save(filepath, url, **kwargs)`

`close()`

`close_all()` (classmethod, Playwright only)