Turn long PDFs into chunked NotebookLM workflows with Studio outputs

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

notebooklm-chunker

Uploading one large PDF to NotebookLM usually gives weak Studio outputs. Reports, slide decks, quizzes, and similar artifacts stay short and generic because they are generated from one oversized context.

notebooklm-chunker solves that by splitting a long document into smaller, heading-aware chunks, uploading each chunk as a separate NotebookLM source, and then running the Studio outputs you choose. The result is closer to an interactive learning kit than a single uploaded PDF.

Demo

This repository ships with a full demo built around the freely downloadable InfoQ mini-book Domain-Driven Design Quickly.

This demo command is set up to split the book into 5 chunks, then generate 5 reports and 5 slide decks from those chunks.

Command:

nblm run --config ./examples/workflows/ddd-quickly-demo.toml

Generated NotebookLM:

DDD Quickly Interactive Learning Kit

Requirements

Python 3.12+
pip

This project automates NotebookLM through notebooklm-py, which is an unofficial community library.

For local development and contribution flow, see DEVELOPMENT.md.

Installation

From PyPI:

pip install "notebooklm-chunker[full]"
python -m playwright install chromium
nblm login

From a local checkout:

python -m pip install "/ABS/PATH/notebooklm-chunker[full]"
python -m playwright install chromium
nblm login

If you already have valid NotebookLM auth state, you can skip nblm login.

To clear local notebooklm-py auth state later:

nblm logout

Quick Start

Create a workflow file:

nblm init

This writes ./nblm.toml. Edit it with your document path, notebook title, and the Studio outputs you want. If you want a ready-made config, you can also copy one of the example workflow files from the GitHub repo into nblm.toml.

Run the whole flow:

nblm run --config ./nblm.toml

Continue later from the saved run state:

nblm resume --config ./nblm.toml

Repo demo example: run the bundled multi-chunk DDD workflow:

nblm run --config ./examples/workflows/ddd-quickly-demo.toml

Repo demo example: resume that workflow later after quotas reset:

nblm resume --config ./examples/workflows/ddd-quickly-demo.toml

Example: after a previous run, add per-chunk quizzes later without re-uploading the chunks:

nblm studios --config ./quiz.toml

Check auth, config, Playwright, and PDF parser readiness:

nblm doctor --config ./nblm.toml

Show the installed CLI version:

nblm --version

source.path lives in the config file, so you do not need to pass the input document as a CLI argument.

Run State And Resume

nblm run always starts a fresh run and writes a state file next to the chunk output:

./output/chunks/.nblm-run-state.json

That file tracks every chunk separately:

whether its NotebookLM source upload is still pending, uploaded, or failed
whether each Studio job for that chunk is pending, completed, or failed
the source_id, task_id, artifact_id, output path, and last error when available

Example shape:

{
  "chunks": {
    "c001-intro.md": {
      "source": {
        "status": "uploaded",
        "source_id": "src-c001-intro"
      },
      "studios": {
        "report": {
          "status": "completed",
          "artifact_id": "art-report-1"
        },
        "slide_deck": {
          "status": "pending",
          "task_id": "art-slide-deck-1"
        }
      }
    }
  }
}

This is why nblm resume can continue hours or days later after quotas reset: it does not guess what happened, it reads the saved job state and continues only the unfinished source or Studio jobs.

If you want to inspect progress manually, open .nblm-run-state.json.

Source uploads and per-chunk Studio jobs run as separate queues. That means new source uploads can keep moving while earlier reports, slide decks, or other Studio jobs are still running.

Quota blocks are tracked per Studio type. If report hits a daily quota limit, slide_deck, quiz, or other Studio types can still continue until they hit their own limits.

Add More Studios Later

This is a useful workflow once your sources are already uploaded.

If you ran nblm run first and later decide you also want per-chunk quizzes, flashcards, reports, or slide decks, you can run nblm studios with a new workflow config. nblm reads .nblm-run-state.json, reuses the saved source ID for each chunk, and generates one new Studio output per chunk without uploading the chunk files again.

Example:

nblm studios --config ./quiz.toml

For per_chunk = true, that means one output per uploaded chunk, not one output from the whole notebook context. If the run state exists, notebook-level Studio jobs also stay scoped to the source IDs from that saved run instead of widening to unrelated sources already in the notebook.

Resume After Quotas

NotebookLM usage limits and quotas depend on your plan. Google documents those limits here:

NotebookLM usage limits and upgrades

That matters for long books. If your quota fills up in the middle of a run, you can stop, wait for the quota window to reset, and then run:

nblm resume --config ./nblm.toml

Because nblm persists source and Studio job state separately, it can continue from where it left off instead of redoing the whole notebook. NotebookLM's help page also notes that daily quotas reset after 24 hours.

When nblm sees a quota-exhausted error during Studio creation, it records an estimated retry time in .nblm-run-state.json, grouped by Studio type, reports that time, and exits once the blocked Studio type should stop retrying. Later, nblm resume checks those saved timestamps and warns you before retrying too early.

Output Files

chunking.output_dir is the working folder for a run:

*.md: the current chunk files that get uploaded as NotebookLM sources
manifest.json: the current chunk list
.nblm-run-state.json: saved source and Studio progress for nblm resume

Treat one chunking.output_dir as one NotebookLM book/workflow. If you want to run another book, or the same book as a separate NotebookLM run, give it a different output folder so it gets its own chunks, manifest, and run state.

Studio downloads go into each Studio output_dir, for example ./output/reports and ./output/slides, but only when runtime.download_outputs = true.

If you edit chunk files and then run resume, nblm continues from the saved state for whatever is still pending. If you want a completely new notebook run, use nblm run.

If nblm prepare or a fresh nblm run targets a non-empty chunk output folder, nblm asks before overwriting the chunk files and run state there. Use --yes when you want to skip that confirmation.

Workflow File

This is the practical full workflow shape:

[source]
path = "./your-document.pdf"
# PDF only. Inclusive physical PDF page ranges to skip (1-based).
# These are file pages, not the page numbers printed inside the book.
# skip_ranges = ["1-8", "399-420", "512"]

[notebook]
title = "Interactive Learning Notebook"
# id = "nb_..."

[chunking]
# `{source_stem}` expands from `source.path`. Example: `book.pdf` -> `book`.
output_dir = "./output/{source_stem}/chunks"
target_pages = 3.0
min_pages = 2.5
max_pages = 4.0
words_per_page = 500

[runtime]
max_parallel_chunks = 3
max_parallel_heavy_studios = 1
studio_wait_timeout_seconds = 7200
studio_create_retries = 5
studio_create_backoff_seconds = 5.0
studio_rate_limit_cooldown_seconds = 30.0
rename_remote_titles = false
download_outputs = true

[studios.report]
enabled = true
per_chunk = true
max_parallel = 3
output_dir = "./output/{source_stem}/reports"
language = "en"
format = "study-guide"
prompt = """
Write a study-guide style report for this chunk.
Explain the main ideas, terminology, and design tradeoffs.
"""

[studios.slide_deck]
enabled = true
per_chunk = true
max_parallel = 3
output_dir = "./output/{source_stem}/slides"
language = "en"
format = "detailed"
length = "default"
download_format = "pdf"
prompt = """
Build a teaching deck for this chunk.
Keep the section order and make each slide carry one clear idea.
"""

Studio Parameters

Common fields:

Field	Meaning
`enabled`	Turn the Studio on or off.
`per_chunk`	Generate one output per chunk instead of one output for the whole notebook.
`max_parallel`	Override generic concurrency for this Studio type.
`prompt`	Extra instructions for NotebookLM. Use TOML multiline strings for anything non-trivial.
`output_path`	Single output file. Best for notebook-level generation.
`output_dir`	Output directory for `per_chunk = true`.
`language`	Output language when supported.

Per-Studio options:

Studio	Extra fields	Defaults
`audio`	`format`, `length`	`deep-dive`, `long`
`video`	`format`, `style`	`explainer`, `whiteboard`
`report`	`format`	`study-guide`
`slide_deck`	`format`, `length`, `download_format`	`detailed`, `default`, `pdf`
`quiz`	`quantity`, `difficulty`, `download_format`	`more`, `hard`, `json`
`flashcards`	`quantity`, `difficulty`, `download_format`	`more`, `hard`, `markdown`
`infographic`	`orientation`, `detail`	`portrait`, `detailed`
`data_table`	`language`, `prompt`	`en`, built-in comparison prompt
`mind_map`	`output_path`	JSON output path

Notes:

For report, format = "custom" sends prompt as the main custom report prompt.
For built-in report formats, prompt is appended as extra instructions.
mind_map currently has no custom prompt surface in notebooklm-py.

Technical Notes

Heading-Aware Chunking

chunks start and end on heading boundaries when possible
chunk size targets target_pages while trying to stay inside min_pages and max_pages
local chunk filenames come from the first or nearest heading, including leading numbers

PDF Cleanup

skip_ranges lets you remove contents, foreword, references, appendix, or index pages
skip_ranges uses physical PDF page numbers, not the page numbers printed inside the book
ranges are inclusive, for example: ["1-8", "399-420", "512"]
if front matter is still present, increase the range and rerun until the first kept page is correct

Parallelism And Quotas

max_parallel_chunks controls how many source uploads run at once
per-chunk Studio jobs run on their own queues after each source upload finishes
max_parallel_heavy_studios is the generic fallback for heavier Studio types such as audio, video, slide_deck, and infographic
studios.<name>.max_parallel overrides that fallback per Studio type
good starting point for long books: max_parallel_chunks = 3
values like 5 can hit NotebookLM quota or rate-limit errors faster

Retry And Backoff

failed NotebookLM CREATE_ARTIFACT calls retry automatically
quota or rate-limit errors trigger a shared cooldown before more Studio create requests are sent
quota exhaustion is tracked per Studio type, so a blocked report queue does not automatically block quiz or slide_deck
tune this with:
- runtime.studio_create_retries
- runtime.studio_create_backoff_seconds
- runtime.studio_rate_limit_cooldown_seconds

Optional Local Downloads

runtime.download_outputs = true keeps local report, slide, quiz, and other Studio files
runtime.download_outputs = false records completion in .nblm-run-state.json without downloading local artifacts
resume uses saved Studio state, not local artifact files, to decide what is already complete

Optional NotebookLM Renaming

by default, NotebookLM keeps its own auto-generated source and artifact titles
set runtime.rename_remote_titles = true if you want NotebookLM titles to follow chunk headings
tradeoff: the related Studio type becomes more serialized so renames stay correct

Examples

Start from the general end-to-end workflows first. Use the partial prepare examples only when you want to inspect chunking before any live NotebookLM run.

DDD Quickly Demo

nblm run --config ./examples/workflows/ddd-quickly-demo.toml

Full Learning Kit

nblm run --config ./examples/workflows/learning-kit.toml

Per-Chunk Report + Slide Deck

nblm run --config ./examples/workflows/per-chunk-report-and-slides.toml

Single-Studio Workflows

NotebookLM Studio is NotebookLM's built-in generation layer: Audio Overview, Video Overview, Report, Slide Deck, Quiz, Flashcards, Infographic, Data Table, and Mind Map.

Single-Studio end-to-end examples live under:

./examples/workflows/studios/

Run one of them with:

nblm run --config ./examples/workflows/studios/audio.toml

Chunking Only

PDF:

nblm prepare --config ./examples/workflows/pdf.toml

Markdown:

nblm prepare --config ./examples/workflows/markdown.toml

Commands

nblm --help:

usage: nblm [-h] [--version]
            {login,logout,doctor,init,prepare,upload,studios,run,resume} ...

Split long documents into NotebookLM-ready chunks and optionally generate
Studio outputs.

positional arguments:
  {login,logout,doctor,init,prepare,upload,studios,run,resume}
    login               Run `notebooklm login` for notebooklm-py authentication.
    logout              Clear notebooklm-py local authentication state from disk.
    doctor              Check config discovery, auth, Playwright, PDF parser, and notebooklm CLI readiness.
    init                Write a workflow config file with chunking and Studio settings.
    prepare             Parse a document and export Markdown chunks.
    upload              Upload existing chunks to NotebookLM.
    studios             Generate enabled Studio outputs for an existing notebook or a saved run state.
    run                 Prepare a document, create a fresh notebook run, then generate enabled Studio outputs.
    resume              Continue a previous run from `.nblm-run-state.json` and finish pending uploads or Studio jobs.

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cmlonder

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.5.0

Mar 14, 2026

0.4.1

Mar 14, 2026

0.4.0

Mar 14, 2026

0.2.1

Mar 8, 2026

This version

0.2.0

Mar 8, 2026

0.1.0

Mar 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notebooklm_chunker-0.2.0.tar.gz (68.9 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

notebooklm_chunker-0.2.0-py3-none-any.whl (52.3 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file notebooklm_chunker-0.2.0.tar.gz.

File metadata

Download URL: notebooklm_chunker-0.2.0.tar.gz
Upload date: Mar 8, 2026
Size: 68.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for notebooklm_chunker-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b43407947c9e64435efa9f228845c422dbe6e7f5ad6cf6d89e3f38cda2445c8a`
MD5	`d4fd4b128156618eac234978f0de5066`
BLAKE2b-256	`c9c57d4ab65ceac11ac180cd86999d803c1a6e90337ee0b082ad6dca09287d9e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for notebooklm_chunker-0.2.0.tar.gz:

Publisher: publish.yml on cmlonder/notebooklm-chunker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: notebooklm_chunker-0.2.0.tar.gz
- Subject digest: b43407947c9e64435efa9f228845c422dbe6e7f5ad6cf6d89e3f38cda2445c8a
- Sigstore transparency entry: 1059514359
- Sigstore integration time: Mar 8, 2026
Source repository:
- Permalink: cmlonder/notebooklm-chunker@003e456bc811f7d9f43c73c36603b655500ef002
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/cmlonder
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@003e456bc811f7d9f43c73c36603b655500ef002
- Trigger Event: release

File details

Details for the file notebooklm_chunker-0.2.0-py3-none-any.whl.

File metadata

Download URL: notebooklm_chunker-0.2.0-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 52.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for notebooklm_chunker-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0636228efa7fc9070568675fb87014aa0a24422c0a613f12a0046e110d6b70c9`
MD5	`54e60d7c89e91a9e0c549b87d1355459`
BLAKE2b-256	`d50c65c7e540fe8f3a66547ba0df11f897f497d23c60a9c6432426f7b4dc9cfc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for notebooklm_chunker-0.2.0-py3-none-any.whl:

Publisher: publish.yml on cmlonder/notebooklm-chunker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: notebooklm_chunker-0.2.0-py3-none-any.whl
- Subject digest: 0636228efa7fc9070568675fb87014aa0a24422c0a613f12a0046e110d6b70c9
- Sigstore transparency entry: 1059514367
- Sigstore integration time: Mar 8, 2026
Source repository:
- Permalink: cmlonder/notebooklm-chunker@003e456bc811f7d9f43c73c36603b655500ef002
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/cmlonder
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@003e456bc811f7d9f43c73c36603b655500ef002
- Trigger Event: release

notebooklm-chunker 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

notebooklm-chunker

Demo

Requirements

Installation

Quick Start

Run State And Resume

Add More Studios Later

Resume After Quotas

Output Files

Workflow File

Studio Parameters

Technical Notes

Heading-Aware Chunking

PDF Cleanup

Parallelism And Quotas

Retry And Backoff

Optional Local Downloads

Optional NotebookLM Renaming

Examples

DDD Quickly Demo

Full Learning Kit

Per-Chunk Report + Slide Deck

Single-Studio Workflows

Chunking Only

Commands

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance