Turn long PDFs into chunked NotebookLM workflows with Studio outputs
Project description
notebooklm-chunker
Uploading one large PDF to NotebookLM usually gives weak Studio outputs. Reports, slide decks, quizzes, and similar artifacts stay short and generic because they are generated from one oversized context.
notebooklm-chunker solves that by splitting a long document into smaller,
heading-aware chunks, uploading each chunk as a separate NotebookLM source,
and then running the Studio outputs you choose. The result is closer to an
interactive learning kit than a single uploaded PDF.
Demo
This repository ships with a full demo built around the freely downloadable InfoQ mini-book Domain-Driven Design Quickly.
This demo command is set up to split the book into 5 chunks, then generate 5 reports and 5 slide decks from those chunks.
Command:
nblm run --config ./examples/workflows/ddd-quickly-demo.toml
Generated NotebookLM:
Requirements
- Python 3.12+
pip
This project automates NotebookLM through
notebooklm-py, which is an
unofficial community library.
For local development and contribution flow, see DEVELOPMENT.md.
Installation
From PyPI:
pip install "notebooklm-chunker[full]"
python -m playwright install chromium
nblm login
From a local checkout:
python -m pip install "/ABS/PATH/notebooklm-chunker[full]"
python -m playwright install chromium
nblm login
If you already have valid NotebookLM auth state, you can skip nblm login.
To clear local notebooklm-py auth state later:
nblm logout
Quick Start
Create a workflow file:
nblm init
This writes ./nblm.toml. Edit it with your document path, notebook title, and
the Studio outputs you want. If you want a ready-made config, you can also copy
one of the example workflow files from the GitHub repo into nblm.toml.
Run the whole flow:
nblm run --config ./nblm.toml
Continue later from the saved run state:
nblm resume --config ./nblm.toml
Repo demo example: run the bundled multi-chunk DDD workflow:
nblm run --config ./examples/workflows/ddd-quickly-demo.toml
Repo demo example: resume that workflow later after quotas reset:
nblm resume --config ./examples/workflows/ddd-quickly-demo.toml
Example: after a previous run, add per-chunk quizzes later without
re-uploading the chunks:
nblm studios --config ./quiz.toml
Check auth, config, Playwright, and PDF parser readiness:
nblm doctor --config ./nblm.toml
Show the installed CLI version:
nblm --version
source.path lives in the config file, so you do not need to pass the input
document as a CLI argument.
Run State And Resume
nblm run always starts a fresh run and writes a state file next to the chunk
output:
./output/chunks/.nblm-run-state.json
That file tracks every chunk separately:
- whether its NotebookLM source upload is still pending, uploaded, or failed
- whether each Studio job for that chunk is pending, completed, or failed
- the
source_id,task_id,artifact_id, output path, and last error when available
Example shape:
{
"chunks": {
"c001-intro.md": {
"source": {
"status": "uploaded",
"source_id": "src-c001-intro"
},
"studios": {
"report": {
"status": "completed",
"artifact_id": "art-report-1"
},
"slide_deck": {
"status": "pending",
"task_id": "art-slide-deck-1"
}
}
}
}
}
This is why nblm resume can continue hours or days later after quotas reset:
it does not guess what happened, it reads the saved job state and continues
only the unfinished source or Studio jobs.
If you want to inspect progress manually, open .nblm-run-state.json.
Source uploads and per-chunk Studio jobs run as separate queues. That means new source uploads can keep moving while earlier reports, slide decks, or other Studio jobs are still running.
Quota blocks are tracked per Studio type. If report hits a daily quota
limit, slide_deck, quiz, or other Studio types can still continue until
they hit their own limits.
Add More Studios Later
This is a useful workflow once your sources are already uploaded.
If you ran nblm run first and later decide you also want per-chunk quizzes,
flashcards, reports, or slide decks, you can run nblm studios with a new
workflow config. nblm reads .nblm-run-state.json, reuses the saved source
ID for each chunk, and generates one new Studio output per chunk without
uploading the chunk files again.
Example:
nblm studios --config ./quiz.toml
For per_chunk = true, that means one output per uploaded chunk, not one
output from the whole notebook context. If the run state exists, notebook-level
Studio jobs also stay scoped to the source IDs from that saved run instead of
widening to unrelated sources already in the notebook.
Resume After Quotas
NotebookLM usage limits and quotas depend on your plan. Google documents those limits here:
That matters for long books. If your quota fills up in the middle of a run, you can stop, wait for the quota window to reset, and then run:
nblm resume --config ./nblm.toml
Because nblm persists source and Studio job state separately, it can continue
from where it left off instead of redoing the whole notebook. NotebookLM's help
page also notes that daily quotas reset after 24 hours.
When nblm sees a quota-exhausted error during Studio creation, it records an
estimated retry time in .nblm-run-state.json, grouped by Studio type,
reports that time, and exits once the blocked Studio type should stop
retrying. Later, nblm resume checks those saved timestamps and warns you
before retrying too early.
Output Files
chunking.output_dir is the working folder for a run:
*.md: the current chunk files that get uploaded as NotebookLM sourcesmanifest.json: the current chunk list.nblm-run-state.json: saved source and Studio progress fornblm resume
Treat one chunking.output_dir as one NotebookLM book/workflow. If you want to
run another book, or the same book as a separate NotebookLM run, give it a
different output folder so it gets its own chunks, manifest, and run state.
Studio downloads go into each Studio output_dir, for example
./output/reports and ./output/slides, but only when
runtime.download_outputs = true.
If you edit chunk files and then run resume, nblm continues from the saved
state for whatever is still pending. If you want a completely new notebook run,
use nblm run.
If nblm prepare or a fresh nblm run targets a non-empty chunk output
folder, nblm asks before overwriting the chunk files and run state there.
Use --yes when you want to skip that confirmation.
Workflow File
This is the practical full workflow shape:
[source]
path = "./your-document.pdf"
# PDF only. Inclusive physical PDF page ranges to skip (1-based).
# These are file pages, not the page numbers printed inside the book.
# skip_ranges = ["1-8", "399-420", "512"]
[notebook]
title = "Interactive Learning Notebook"
# id = "nb_..."
[chunking]
# `{source_stem}` expands from `source.path`. Example: `book.pdf` -> `book`.
output_dir = "./output/{source_stem}/chunks"
target_pages = 3.0
min_pages = 2.5
max_pages = 4.0
words_per_page = 500
[runtime]
max_parallel_chunks = 3
max_parallel_heavy_studios = 1
studio_wait_timeout_seconds = 7200
studio_create_retries = 5
studio_create_backoff_seconds = 5.0
studio_rate_limit_cooldown_seconds = 30.0
rename_remote_titles = false
download_outputs = true
[studios.report]
enabled = true
per_chunk = true
max_parallel = 3
output_dir = "./output/{source_stem}/reports"
language = "en"
format = "study-guide"
prompt = """
Write a study-guide style report for this chunk.
Explain the main ideas, terminology, and design tradeoffs.
"""
[studios.slide_deck]
enabled = true
per_chunk = true
max_parallel = 3
output_dir = "./output/{source_stem}/slides"
language = "en"
format = "detailed"
length = "default"
download_format = "pdf"
prompt = """
Build a teaching deck for this chunk.
Keep the section order and make each slide carry one clear idea.
"""
Studio Parameters
Common fields:
| Field | Meaning |
|---|---|
enabled |
Turn the Studio on or off. |
per_chunk |
Generate one output per chunk instead of one output for the whole notebook. |
max_parallel |
Override generic concurrency for this Studio type. |
prompt |
Extra instructions for NotebookLM. Use TOML multiline strings for anything non-trivial. |
output_path |
Single output file. Best for notebook-level generation. |
output_dir |
Output directory for per_chunk = true. |
language |
Output language when supported. |
Per-Studio options:
| Studio | Extra fields | Defaults |
|---|---|---|
audio |
format, length |
deep-dive, long |
video |
format, style |
explainer, whiteboard |
report |
format |
study-guide |
slide_deck |
format, length, download_format |
detailed, default, pdf |
quiz |
quantity, difficulty, download_format |
more, hard, json |
flashcards |
quantity, difficulty, download_format |
more, hard, markdown |
infographic |
orientation, detail |
portrait, detailed |
data_table |
language, prompt |
en, built-in comparison prompt |
mind_map |
output_path |
JSON output path |
Notes:
- For
report,format = "custom"sendspromptas the main custom report prompt. - For built-in report formats,
promptis appended as extra instructions. mind_mapcurrently has no custom prompt surface innotebooklm-py.
Technical Notes
Heading-Aware Chunking
- chunks start and end on heading boundaries when possible
- chunk size targets
target_pageswhile trying to stay insidemin_pagesandmax_pages - local chunk filenames come from the first or nearest heading, including leading numbers
PDF Cleanup
skip_rangeslets you remove contents, foreword, references, appendix, or index pagesskip_rangesuses physical PDF page numbers, not the page numbers printed inside the book- ranges are inclusive, for example:
["1-8", "399-420", "512"] - if front matter is still present, increase the range and rerun until the first kept page is correct
Parallelism And Quotas
max_parallel_chunkscontrols how many source uploads run at once- per-chunk Studio jobs run on their own queues after each source upload finishes
max_parallel_heavy_studiosis the generic fallback for heavier Studio types such asaudio,video,slide_deck, andinfographicstudios.<name>.max_paralleloverrides that fallback per Studio type- good starting point for long books:
max_parallel_chunks = 3 - values like
5can hit NotebookLM quota or rate-limit errors faster
Retry And Backoff
- failed NotebookLM
CREATE_ARTIFACTcalls retry automatically - quota or rate-limit errors trigger a shared cooldown before more Studio create requests are sent
- quota exhaustion is tracked per Studio type, so a blocked
reportqueue does not automatically blockquizorslide_deck - tune this with:
runtime.studio_create_retriesruntime.studio_create_backoff_secondsruntime.studio_rate_limit_cooldown_seconds
Optional Local Downloads
runtime.download_outputs = truekeeps local report, slide, quiz, and other Studio filesruntime.download_outputs = falserecords completion in.nblm-run-state.jsonwithout downloading local artifactsresumeuses saved Studio state, not local artifact files, to decide what is already complete
Optional NotebookLM Renaming
- by default, NotebookLM keeps its own auto-generated source and artifact titles
- set
runtime.rename_remote_titles = trueif you want NotebookLM titles to follow chunk headings - tradeoff: the related Studio type becomes more serialized so renames stay correct
Examples
Start from the general end-to-end workflows first. Use the partial prepare
examples only when you want to inspect chunking before any live NotebookLM run.
DDD Quickly Demo
nblm run --config ./examples/workflows/ddd-quickly-demo.toml
Full Learning Kit
nblm run --config ./examples/workflows/learning-kit.toml
Per-Chunk Report + Slide Deck
nblm run --config ./examples/workflows/per-chunk-report-and-slides.toml
Single-Studio Workflows
NotebookLM Studio is NotebookLM's built-in generation layer: Audio Overview, Video Overview, Report, Slide Deck, Quiz, Flashcards, Infographic, Data Table, and Mind Map.
Single-Studio end-to-end examples live under:
./examples/workflows/studios/
Run one of them with:
nblm run --config ./examples/workflows/studios/audio.toml
Chunking Only
PDF:
nblm prepare --config ./examples/workflows/pdf.toml
Markdown:
nblm prepare --config ./examples/workflows/markdown.toml
Commands
nblm --help:
usage: nblm [-h] [--version]
{login,logout,doctor,init,prepare,upload,studios,run,resume} ...
Split long documents into NotebookLM-ready chunks and optionally generate
Studio outputs.
positional arguments:
{login,logout,doctor,init,prepare,upload,studios,run,resume}
login Run `notebooklm login` for notebooklm-py authentication.
logout Clear notebooklm-py local authentication state from disk.
doctor Check config discovery, auth, Playwright, PDF parser, and notebooklm CLI readiness.
init Write a workflow config file with chunking and Studio settings.
prepare Parse a document and export Markdown chunks.
upload Upload existing chunks to NotebookLM.
studios Generate enabled Studio outputs for an existing notebook or a saved run state.
run Prepare a document, create a fresh notebook run, then generate enabled Studio outputs.
resume Continue a previous run from `.nblm-run-state.json` and finish pending uploads or Studio jobs.
options:
-h, --help show this help message and exit
--version show program's version number and exit
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file notebooklm_chunker-0.2.0.tar.gz.
File metadata
- Download URL: notebooklm_chunker-0.2.0.tar.gz
- Upload date:
- Size: 68.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b43407947c9e64435efa9f228845c422dbe6e7f5ad6cf6d89e3f38cda2445c8a
|
|
| MD5 |
d4fd4b128156618eac234978f0de5066
|
|
| BLAKE2b-256 |
c9c57d4ab65ceac11ac180cd86999d803c1a6e90337ee0b082ad6dca09287d9e
|
Provenance
The following attestation bundles were made for notebooklm_chunker-0.2.0.tar.gz:
Publisher:
publish.yml on cmlonder/notebooklm-chunker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
notebooklm_chunker-0.2.0.tar.gz -
Subject digest:
b43407947c9e64435efa9f228845c422dbe6e7f5ad6cf6d89e3f38cda2445c8a - Sigstore transparency entry: 1059514359
- Sigstore integration time:
-
Permalink:
cmlonder/notebooklm-chunker@003e456bc811f7d9f43c73c36603b655500ef002 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/cmlonder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@003e456bc811f7d9f43c73c36603b655500ef002 -
Trigger Event:
release
-
Statement type:
File details
Details for the file notebooklm_chunker-0.2.0-py3-none-any.whl.
File metadata
- Download URL: notebooklm_chunker-0.2.0-py3-none-any.whl
- Upload date:
- Size: 52.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0636228efa7fc9070568675fb87014aa0a24422c0a613f12a0046e110d6b70c9
|
|
| MD5 |
54e60d7c89e91a9e0c549b87d1355459
|
|
| BLAKE2b-256 |
d50c65c7e540fe8f3a66547ba0df11f897f497d23c60a9c6432426f7b4dc9cfc
|
Provenance
The following attestation bundles were made for notebooklm_chunker-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on cmlonder/notebooklm-chunker
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
notebooklm_chunker-0.2.0-py3-none-any.whl -
Subject digest:
0636228efa7fc9070568675fb87014aa0a24422c0a613f12a0046e110d6b70c9 - Sigstore transparency entry: 1059514367
- Sigstore integration time:
-
Permalink:
cmlonder/notebooklm-chunker@003e456bc811f7d9f43c73c36603b655500ef002 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/cmlonder
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@003e456bc811f7d9f43c73c36603b655500ef002 -
Trigger Event:
release
-
Statement type: