Export O'Reilly books as high-quality PDF via headless Chrome
Project description
oreilly2pdf
Export O'Reilly Learning books as high-quality PDFs with working images, table of contents, and cross-chapter hyperlinks.
Features
- Full book export — cover, chapters, appendices, index, and all front/back matter.
- High-fidelity rendering — uses headless Chrome to capture the exact same layout you see in the browser, including mathematical equations, code blocks, tables, and figures.
- Images — lazy-loaded and dynamically-rendered images are fully resolved before printing.
- Cross-chapter links — internal references (e.g., "see Section 4.3", bibliography citations, index entries) are converted into clickable PDF links that jump to the correct page.
- Clean output — O'Reilly's navigation UI, cookie banners, sidebar menus, and overlays are stripped, leaving only the book content.
Prerequisites
- Python 3.10+
- Google Chrome (or Chromium) installed
- ChromeDriver matching your Chrome version — installed automatically by Selenium 4.20+
- A valid O'Reilly Learning subscription
Installation
From PyPI
pip install oreilly2pdf
From source
git clone https://github.com/cruzlorite/oreilly2pdf.git
cd oreilly2pdf
pip install .
Usage
# Using a cookies file (recommended)
oreilly2pdf 9781098150952 --cookie-file cookies.json
# Using inline cookies
oreilly2pdf 9781098150952 --cookies "BrowserCookie=abc123; logged_in=1; ..."
# Custom output path
oreilly2pdf 9781098150952 --cookie-file cookies.json -o my_book.pdf
# Keep individual chapter PDFs
oreilly2pdf 9781098150952 --cookie-file cookies.json --keep-chapters
Options
| Flag | Description |
|---|---|
book_id |
The O'Reilly book identifier (ISBN). |
--cookies |
Session cookies as key=value pairs separated by semicolons. |
--cookie-file |
Path to a cookies file (JSON or plain text). |
-o, --output |
Output PDF file path (default: <book_id>.pdf). |
--keep-chapters |
Keep individual chapter PDFs in a directory alongside the output. |
Getting Your Cookies
oreilly2pdf needs your O'Reilly session cookies to authenticate. Here's how to get them:
Option 1 — JSON file (recommended)
- Log in to learning.oreilly.com in Chrome.
- Open DevTools (
F12) → Application tab → Cookies →https://learning.oreilly.com. - Create a JSON file with the cookie name/value pairs:
{
"BrowserCookie": "your_value_here",
"logged_in": "1",
"orm-jwt": "your_jwt_token",
"orm-rt": "your_refresh_token",
"groot_sessionid": "your_session_id"
}
The exact cookies needed may vary, but orm-jwt and groot_sessionid are typically the most important. If export fails with authentication errors, try adding more cookies from your browser.
Tip — browser console: You can also grab all cookies at once by running this in the DevTools Console while on
learning.oreilly.com:JSON.stringify(Object.fromEntries(document.cookie.split('; ').map(c => c.split('='))))Copy the output and save it as
cookies.json.
Tip — extension: The Cookie-Editor browser extension can export all cookies as JSON with one click. Export as JSON, keep only the
learning.oreilly.comentries, and reformat as{"name": "value"}pairs.
- Save as
cookies.jsonand pass it with--cookie-file cookies.json.
Option 2 — Plain text
Copy cookies as a semicolon-separated string:
oreilly2pdf 9781098150952 --cookies "BrowserCookie=abc; orm-jwt=eyJ...; groot_sessionid=xyz"
Finding the Book ID
The book ID is the ISBN that appears in the O'Reilly URL:
https://learning.oreilly.com/library/view/book-title/9781098150952/
^^^^^^^^^^^^^
This is the book_id
How It Works
- Fetches the book spine from the O'Reilly API to get an ordered list of all content files (cover, chapters, appendices, index, etc.).
- Renders each chapter in headless Chrome with your session cookies.
- Waits for all images to fully load (handles lazy-loading, viewport-triggered loading, and dynamic image injection).
- Cleans the page — removes the O'Reilly reading UI (header, sidebar, navigation, cookie banners, overlays) and keeps only the article content.
- Creates PDF named destinations for every element with an
idattribute, enabling cross-chapter link resolution. - Prints each chapter to PDF using the Chrome DevTools Protocol.
- Merges all chapter PDFs into a single file and rewrites internal URI links as PDF GoTo links, so cross-chapter references, index entries, and bibliography citations all work as clickable links.
Acknowledgements
Inspired by oreilly-epub-downloader by @tctibbs.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oreilly2pdf-0.1.0.tar.gz.
File metadata
- Download URL: oreilly2pdf-0.1.0.tar.gz
- Upload date:
- Size: 15.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59af0ec95dc72a97c622bb4e03853fe70edbf03a2c2bcdff986bae9e868e488a
|
|
| MD5 |
bcb1b24d1dc28d34246d0a9622f78e19
|
|
| BLAKE2b-256 |
07479ebbaada2de7dbbee588fcee4340b597814885d2effd4e1eeafc55e4a1d2
|
Provenance
The following attestation bundles were made for oreilly2pdf-0.1.0.tar.gz:
Publisher:
publish.yml on cruzlorite/oreilly2pdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
oreilly2pdf-0.1.0.tar.gz -
Subject digest:
59af0ec95dc72a97c622bb4e03853fe70edbf03a2c2bcdff986bae9e868e488a - Sigstore transparency entry: 1105479533
- Sigstore integration time:
-
Permalink:
cruzlorite/oreilly2pdf@89ac594a4848cb89a19e5af669582d77f52b8f1a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/cruzlorite
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@89ac594a4848cb89a19e5af669582d77f52b8f1a -
Trigger Event:
release
-
Statement type:
File details
Details for the file oreilly2pdf-0.1.0-py3-none-any.whl.
File metadata
- Download URL: oreilly2pdf-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0394da921955639f3da312245cbec9bf8c110437ab7faee272dc8ff70a822e11
|
|
| MD5 |
9060068daa58f35294c1210a97a77ebc
|
|
| BLAKE2b-256 |
b612fdbf685e8f2156dfead0da45bd080e0dd1203e34e81cc7bbb78700fad41e
|
Provenance
The following attestation bundles were made for oreilly2pdf-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on cruzlorite/oreilly2pdf
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
oreilly2pdf-0.1.0-py3-none-any.whl -
Subject digest:
0394da921955639f3da312245cbec9bf8c110437ab7faee272dc8ff70a822e11 - Sigstore transparency entry: 1105479609
- Sigstore integration time:
-
Permalink:
cruzlorite/oreilly2pdf@89ac594a4848cb89a19e5af669582d77f52b8f1a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/cruzlorite
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@89ac594a4848cb89a19e5af669582d77f52b8f1a -
Trigger Event:
release
-
Statement type: