Skip to main content

Turn long PDFs into chunked NotebookLM workflows with Studio outputs

Project description

notebooklm-chunker

Uploading one large PDF to NotebookLM usually gives weak Studio outputs. Reports, slide decks, quizzes, and similar artifacts stay short and generic because they are generated from one oversized context.

notebooklm-chunker solves that by splitting a long document into smaller, heading-aware chunks, uploading each chunk as a separate NotebookLM source, and then running the Studio outputs you choose. The result is closer to an interactive learning kit than a single uploaded PDF.

Demo

This repository ships with a full demo built around the freely downloadable InfoQ mini-book Domain-Driven Design Quickly.

Demo command:

nblm run --config ./examples/workflows/ddd-quickly-demo.toml

Demo files:

  • Workflow file: ./examples/workflows/ddd-quickly-demo.toml
  • Source PDF: ./examples/ddd-quickly.pdf

What you get:

  • NotebookLM is ready with your chunked PDF sources and configured Studio (report and slide in this case) outputs
  • Markdown chunks under ./examples/workflows/output/ddd-quickly/chunks
  • One report per chunk under ./examples/workflows/output/ddd-quickly/reports
  • One slide deck per chunk under ./examples/workflows/output/ddd-quickly/slides

Requirements

  • Python 3.12+
  • pip
  • A NotebookLM account
  • The same Python interpreter for install and Playwright setup

This project automates NotebookLM through notebooklm-py, which is an unofficial community library.

For local development and contribution flow, see DEVELOPMENT.md.

Installation

From PyPI, once the package is published:

pip install "notebooklm-chunker[full]"
python -m playwright install chromium
nblm doctor
nblm login

From a local checkout:

python -m pip install "/ABS/PATH/notebooklm-chunker[full]"
python -m playwright install chromium
nblm doctor
nblm login

To clear local notebooklm-py auth state later:

nblm logout

Quick Start

Create a starter workflow:

nblm init

Check auth, config, Playwright, and PDF parser readiness:

nblm doctor --config ./nblm.toml

Run the whole flow:

nblm run --config ./nblm.toml

Continue later from the saved run state:

nblm resume --config ./nblm.toml

source.path lives in the config file, so you do not need to pass the input document as a CLI argument.

Run State And Resume

nblm run always starts a fresh run and writes a state file next to the chunk output:

./output/chunks/.nblm-run-state.json

That file tracks every chunk separately:

  • whether its NotebookLM source upload is still pending, uploaded, or failed
  • whether each Studio job for that chunk is pending, completed, or failed
  • the source_id, task_id, artifact_id, output path, and last error when available

Example shape:

{
  "chunks": {
    "001-intro.md": {
      "source": {
        "status": "uploaded",
        "source_id": "src-001-intro"
      },
      "studios": {
        "report": {
          "status": "completed",
          "artifact_id": "art-report-1"
        },
        "slide_deck": {
          "status": "pending",
          "task_id": "art-slide-deck-1"
        }
      }
    }
  }
}

This is why nblm resume can continue hours or days later after quotas reset: it does not guess what happened, it reads the saved job state and continues only the unfinished source or Studio jobs.

If you want to inspect progress manually, open .nblm-run-state.json.

Source uploads and per-chunk Studio jobs run as separate queues. That means new source uploads can keep moving while earlier reports, slide decks, or other Studio jobs are still running.

Resume After Quotas

NotebookLM usage limits and quotas depend on your plan. Google documents those limits here:

That matters for long books. If your quota fills up in the middle of a run, you can stop, wait for the quota window to reset, and then run:

nblm resume --config ./nblm.toml

Because nblm persists source and Studio job state separately, it can continue from where it left off instead of redoing the whole notebook. NotebookLM's help page also notes that daily quotas reset after 24 hours.

Workflow File

This is the practical full workflow shape:

[source]
path = "./your-document.pdf"
# PDF only. Inclusive page ranges to skip.
# skip_ranges = ["1-8", "399-420", "512"]

[notebook]
title = "Interactive Learning Notebook"
# id = "nb_..."

[chunking]
output_dir = "./output/chunks"
target_pages = 3.0
min_pages = 2.5
max_pages = 4.0
words_per_page = 500

[runtime]
max_parallel_chunks = 3
max_parallel_heavy_studios = 1
studio_wait_timeout_seconds = 7200
studio_create_retries = 5
studio_create_backoff_seconds = 5.0
studio_rate_limit_cooldown_seconds = 30.0
rename_remote_titles = false

[studios.report]
enabled = true
per_chunk = true
max_parallel = 3
output_dir = "./output/reports"
language = "en"
format = "study-guide"
prompt = """
Write a study-guide style report for this chunk.
Explain the main ideas, terminology, and design tradeoffs.
"""

[studios.slide_deck]
enabled = true
per_chunk = true
max_parallel = 3
output_dir = "./output/slides"
language = "en"
format = "detailed"
length = "default"
download_format = "pdf"
prompt = """
Build a teaching deck for this chunk.
Keep the section order and make each slide carry one clear idea.
"""

Studio Parameters

Common fields:

Field Meaning
enabled Turn the Studio on or off.
per_chunk Generate one output per chunk instead of one output for the whole notebook.
max_parallel Override generic concurrency for this Studio type.
prompt Extra instructions for NotebookLM. Use TOML multiline strings for anything non-trivial.
output_path Single output file. Best for notebook-level generation.
output_dir Output directory for per_chunk = true.
language Output language when supported.

Per-Studio options:

Studio Extra fields Defaults
audio format, length deep-dive, long
video format, style explainer, whiteboard
report format study-guide
slide_deck format, length, download_format detailed, default, pdf
quiz quantity, difficulty, download_format more, hard, json
flashcards quantity, difficulty, download_format more, hard, markdown
infographic orientation, detail portrait, detailed
data_table language, prompt en, built-in comparison prompt
mind_map output_path JSON output path

Notes:

  • For report, format = "custom" sends prompt as the main custom report prompt.
  • For built-in report formats, prompt is appended as extra instructions.
  • mind_map currently has no custom prompt surface in notebooklm-py.

Technical Notes

Heading-Aware Chunking

  • chunks start and end on heading boundaries when possible
  • chunk size targets target_pages while trying to stay inside min_pages and max_pages
  • local chunk filenames come from the first or nearest heading, including leading numbers

PDF Cleanup

  • skip_ranges lets you remove contents, foreword, references, appendix, or index pages
  • ranges are inclusive, for example: ["1-8", "399-420", "512"]

Parallelism And Quotas

  • max_parallel_chunks controls how many source uploads run at once
  • per-chunk Studio jobs run on their own queues after each source upload finishes
  • max_parallel_heavy_studios is the generic fallback for heavier Studio types such as audio, video, slide_deck, and infographic
  • studios.<name>.max_parallel overrides that fallback per Studio type
  • good starting point for long books: max_parallel_chunks = 3
  • values like 5 can hit NotebookLM quota or rate-limit errors faster

Retry And Backoff

  • failed NotebookLM CREATE_ARTIFACT calls retry automatically
  • quota or rate-limit errors trigger a shared cooldown before more Studio create requests are sent
  • tune this with:
    • runtime.studio_create_retries
    • runtime.studio_create_backoff_seconds
    • runtime.studio_rate_limit_cooldown_seconds

Optional NotebookLM Renaming

  • by default, NotebookLM keeps its own auto-generated source and artifact titles
  • set runtime.rename_remote_titles = true if you want NotebookLM titles to follow chunk headings
  • tradeoff: the related Studio type becomes more serialized so renames stay correct

Examples

Start from the general end-to-end workflows first. Use the partial prepare examples only when you want to inspect chunking before any live NotebookLM run.

DDD Quickly Demo

nblm run --config ./examples/workflows/ddd-quickly-demo.toml

Full Learning Kit

nblm run --config ./examples/workflows/learning-kit.toml

Per-Chunk Report + Slide Deck

nblm run --config ./examples/workflows/per-chunk-report-and-slides.toml

Single-Studio Workflows

NotebookLM Studio is NotebookLM's built-in generation layer: Audio Overview, Video Overview, Report, Slide Deck, Quiz, Flashcards, Infographic, Data Table, and Mind Map.

Single-Studio end-to-end examples live under:

./examples/workflows/studios/

Run one of them with:

nblm run --config ./examples/workflows/studios/audio.toml

Chunking Only

PDF:

nblm prepare --config ./examples/workflows/pdf.toml

Markdown:

nblm prepare --config ./examples/workflows/markdown.toml

Commands

nblm --help:

usage: nblm [-h]
            {login,logout,doctor,init,prepare,upload,studios,run,resume} ...

Split long documents into NotebookLM-ready chunks and optionally generate
Studio outputs.

positional arguments:
  {login,logout,doctor,init,prepare,upload,studios,run,resume}
    login               Run `notebooklm login` for notebooklm-py authentication.
    logout              Clear notebooklm-py local authentication state from disk.
    doctor              Check config discovery, auth, Playwright, PDF parser, and notebooklm CLI readiness.
    init                Write a workflow config file with chunking and Studio settings.
    prepare             Parse a document and export Markdown chunks.
    upload              Upload existing chunks to NotebookLM.
    studios             Generate enabled Studio outputs for an existing notebook.
    run                 Prepare a document, create a fresh notebook run, then generate enabled Studio outputs.
    resume              Continue a previous run from `.nblm-run-state.json` and finish pending uploads or Studio jobs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notebooklm_chunker-0.1.0.tar.gz (58.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

notebooklm_chunker-0.1.0-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file notebooklm_chunker-0.1.0.tar.gz.

File metadata

  • Download URL: notebooklm_chunker-0.1.0.tar.gz
  • Upload date:
  • Size: 58.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for notebooklm_chunker-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9cf64911b6cc6ea981d1fa8c5d04d671de941e78319abf0bb328ac135a74b566
MD5 54084499da9db33a26dc38e445e5f010
BLAKE2b-256 698e14d541e053fea11bcb00bc4eff0d1739acd473cfc8522c616652f39530d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for notebooklm_chunker-0.1.0.tar.gz:

Publisher: publish.yml on cmlonder/notebooklm-chunker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file notebooklm_chunker-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for notebooklm_chunker-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7ea828a66264c850310e5b5ea1ae9282e0cd3e72137b517121fae96840c6ac2f
MD5 73d5a3d9b2c2ddacdeb2b7bfb33c939e
BLAKE2b-256 ff9bc7848b4886818f9d551b21f12c7d37b77a4e3a0b2e77f8aea9621cb736c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for notebooklm_chunker-0.1.0-py3-none-any.whl:

Publisher: publish.yml on cmlonder/notebooklm-chunker

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page