Skip to main content

Export O'Reilly books as high-quality PDF via headless Chrome

Project description

📚 oreilly2pdf

PyPI version Python versions License: MIT GitHub stars

Download any book from O'Reilly Learning as a single, high-quality PDF.

All images, cross-chapter links, table of contents, and index entries just work — exactly as you'd expect from a real book.

⚠️ Requires an active O'Reilly Learning subscription.


⚡ Quick Start

pip install oreilly2pdf
oreilly2pdf 9781098150952 --cookie-file cookies.json

That's it. You'll get a 9781098150952.pdf with all chapters merged into one file.


🔧 Installation

From PyPI

pip install oreilly2pdf

From source

git clone https://github.com/cruzlorite/oreilly2pdf.git
cd oreilly2pdf
pip install .

Requirements

  • Python 3.10+
  • Google Chrome (or Chromium)
  • ChromeDriver — installed automatically by Selenium 4.20+

🍪 Getting Your Cookies

You need to provide your O'Reilly session cookies so the tool can access your account. There are three easy ways to get them:

Way 1 — DevTools Console (fastest)

  1. Log in to learning.oreilly.com in Chrome.
  2. Open DevTools (F12) → go to the Console tab.
  3. Paste this and press Enter:
copy(JSON.stringify(Object.fromEntries(document.cookie.split('; ').map(c => c.split('=')))))
  1. Your cookies are now in the clipboard. Save them to a file:
pbpaste > cookies.json   # macOS
xclip -o > cookies.json  # Linux

Way 2 — Cookie-Editor extension

  1. Install Cookie-Editor in your browser.
  2. Go to learning.oreilly.com and log in.
  3. Click the Cookie-Editor icon → ExportJSON.
  4. Paste into cookies.json and reformat as {"name": "value"} pairs.

Way 3 — Manual

  1. Open DevTools (F12) → Application tab → Cookieshttps://learning.oreilly.com.
  2. Create a cookies.json with the relevant cookie values:
{
  "BrowserCookie": "...",
  "orm-jwt": "...",
  "orm-rt": "...",
  "groot_sessionid": "..."
}

Note: The most important cookies are typically orm-jwt and groot_sessionid. If export fails, try adding more cookies from your browser.


📖 Finding the Book ID

Open any book on O'Reilly and look at the URL — the book ID is the ISBN number:

https://learning.oreilly.com/library/view/book-title/9781098150952/
                                                     ^^^^^^^^^^^^^
                                                        book_id

🚀 Usage

# Basic usage
oreilly2pdf <book_id> --cookie-file cookies.json

# Custom output filename
oreilly2pdf 9781098150952 --cookie-file cookies.json -o my_book.pdf

# Inline cookies instead of a file
oreilly2pdf 9781098150952 --cookies "orm-jwt=eyJ...; groot_sessionid=xyz"

# Keep individual chapter PDFs alongside the merged output
oreilly2pdf 9781098150952 --cookie-file cookies.json --keep-chapters

All Options

Option Description
book_id O'Reilly book identifier (ISBN) — required
--cookie-file FILE Path to a cookies file (JSON or plain text)
--cookies STRING Inline cookies (key=value; key2=value2)
-o, --output FILE Output path (default: <book_id>.pdf)
--keep-chapters Save individual chapter PDFs too

✨ Features

📄 Full book Cover, TOC, all chapters, appendices, index — everything
🖼️ Images Lazy-loaded and dynamic images fully resolved
�� Cross-chapter links "See Section 4.3" actually jumps to Section 4.3
🧹 Clean output No navigation bars, cookie banners, or popups
🎨 Faithful rendering Math, code blocks, tables, figures — pixel-perfect

🔍 How It Works

  1. Fetches the book's table of contents from the O'Reilly API.
  2. Opens each chapter in headless Chrome with your session cookies.
  3. Waits for all images (including lazy-loaded ones) to fully render.
  4. Strips the O'Reilly UI — keeps only the book content.
  5. Prints each chapter to PDF via Chrome DevTools Protocol.
  6. Merges everything into a single PDF and rewrites cross-chapter links so they work as clickable in-document jumps.

🙏 Acknowledgements

Inspired by oreilly-epub-downloader by @tctibbs.

📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oreilly2pdf-0.1.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oreilly2pdf-0.1.1-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file oreilly2pdf-0.1.1.tar.gz.

File metadata

  • Download URL: oreilly2pdf-0.1.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oreilly2pdf-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8d682a108e7be9d4beb24776c3b90bb16b8a00aa8da3dcd3c2d2b0ae3e0b4d43
MD5 d9ec3e92771f0652ef693417848041a2
BLAKE2b-256 bfce24fae53dcf02a44540a7034b0caea82fc8e953ad31f912e93ebe294c1c1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for oreilly2pdf-0.1.1.tar.gz:

Publisher: publish.yml on cruzlorite/oreilly2pdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file oreilly2pdf-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: oreilly2pdf-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for oreilly2pdf-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e97e7350a104a15dbf95c5bdd0868cf0ec3d57f8ac312fc27df5a028c4892812
MD5 4bec162f9518f8f78c2586a8ac5a0203
BLAKE2b-256 6220e01d9e09832d067198ff683fe956001970888ce2131529c1904deba584b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for oreilly2pdf-0.1.1-py3-none-any.whl:

Publisher: publish.yml on cruzlorite/oreilly2pdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page