Fetch web pages as clean Markdown for LLM agents. HTTP-first, optional Chromium rendering, CLI + Python + MCP.

These details have not been verified by PyPI

Project links

Project description

pulldown

Pull down web pages as clean Markdown for LLM agents.

HTTP-first with browser-like defaults
Optional Chromium rendering for JS-heavy pages
Four detail levels: minimal, readable, full, raw
Core installs decode Brotli-compressed pages correctly
Concurrent batch fetching with fetch_many()
Bounded site crawling with robots.txt support and per-domain politeness
Validator-based caching (ETag / Last-Modified) with atomic writes
SSRF guards: private/loopback/metadata addresses blocked by default
Response size caps and transient-error retries
CLI, Python API, and MCP server

Install

pip install pulldown                 # core
pip install 'pulldown[render]'       # + Playwright (Chromium rendering)
pip install 'pulldown[mcp]'          # + MCP server
pip install 'pulldown[all]'          # everything

Core installs include Brotli support, so br-compressed HTML is decoded before minimal, readable, full, or raw processing.

Core installs also include lxml_html_clean, avoiding the missing-helper import issue some agent sandboxes hit on older releases.

For rendered pages, also run playwright install chromium once.

Quick Start

CLI

pulldown get https://example.com
pulldown get https://example.com --detail minimal
pulldown get https://example.com --render --scroll 3
pulldown crawl https://docs.example.com --max-pages 20 --delay-ms 200
pulldown bench https://example.com --runs 5
pulldown cache stats

Python

import asyncio
from pulldown import fetch, fetch_many, crawl, Detail, PageCache

async def main():
    # Single fetch
    result = await fetch("https://example.com", detail=Detail.readable)
    print(result.title, result.content)

    # Batch fetch with caching
    cache = PageCache(ttl=3600)
    results = await fetch_many(
        ["https://a.com", "https://b.com"],
        concurrency=5,
        cache=cache,
        retries=2,
    )

    # Crawl a docs site
    crawl_result = await crawl(
        "https://docs.example.com/",
        max_pages=50,
        max_depth=2,
        respect_robots=True,
        per_domain_delay_ms=200,
    )
    markdown = crawl_result.to_markdown()

asyncio.run(main())

MCP

Add to your client config (e.g. Claude Desktop):

{
  "mcpServers": {
    "pulldown": {
      "command": "python",
      "args": ["-m", "pulldown.mcp_server"],
      "env": {
        "PULLDOWN_CACHE_DIR": "~/.cache/pulldown"
      }
    }
  }
}

Environment variables:

Variable	Default	Purpose
`MCP_TRANSPORT`	`stdio`	`stdio` or `http`
`MCP_HOST`	`127.0.0.1`	Bind address for HTTP transport
`MCP_PORT`	`8080`	Port for HTTP transport
`PULLDOWN_CACHE_DIR`	unset	Enable caching to this directory
`PULLDOWN_CACHE_TTL`	`3600`	Cache TTL in seconds
`PULLDOWN_ALLOW_PRIVATE`	`0`	Set to `1` to allow private addresses

Detail Levels

Level	Output	Best for
`minimal`	Title + plain text	Lowest-token summarisation
`readable`	Clean Markdown with links	RAG, reading, structured landing pages (default)
`full`	Full-page Markdown incl. chrome	Pages without clear article body
`raw`	Untouched HTML	Custom parsing downstream

Security

pulldown refuses to fetch URLs that resolve to private, loopback, link-local, or cloud-metadata addresses by default. This prevents LLM-driven SSRF into internal services (e.g., AWS metadata at 169.254.169.254, Redis on localhost:6379). Override with allow_private_addresses=True if you understand the risk.

Responses above 10 MiB are rejected by default (max_bytes parameter).

Only http and https schemes are accepted; file:, ftp:, etc. are rejected.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.3.1

Apr 14, 2026

0.2.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pulldown-0.3.1.tar.gz (104.0 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pulldown-0.3.1-py3-none-any.whl (28.9 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file pulldown-0.3.1.tar.gz.

File metadata

Download URL: pulldown-0.3.1.tar.gz
Upload date: Apr 14, 2026
Size: 104.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pulldown-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`e32a46b88dfc8133429b015eb147f1883b5d5301b43ec5779647c3a01f6e2cdc`
MD5	`18e85cf6195557e71918e24ed185aa1e`
BLAKE2b-256	`642421d4eab0ecb8dc8371ef1d7e89e6cde6501a93128cdf904f9c53b7599f8a`

See more details on using hashes here.

File details

Details for the file pulldown-0.3.1-py3-none-any.whl.

File metadata

Download URL: pulldown-0.3.1-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 28.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for pulldown-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`553486287458e10054cad7683cea85acd3fca0c674533ed35bef63d9e816ab44`
MD5	`99ca8cecbb2168059462c1a2905d81c7`
BLAKE2b-256	`2d9369edf909519a7d895c6f7017d0d6afd8222e9d28cf2da6739a2ca9548975`

See more details on using hashes here.

pulldown 0.3.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

pulldown

Install

Quick Start

CLI

Python

MCP

Detail Levels

Security

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes