Summarize YouTube videos

Project description

Overview of PodcastSummary: Automated Podcast Summarization Using Ollama

The PodcastSummary class represents a powerful tool for automatically generating structured summaries of podcasts using local LLM inference via Ollama. This text breaks down the architecture and workflow of this AI-powered summarization pipeline.

Core Architecture

At its heart, the class implements a multi-stage process:

Extract podcast content from YouTube
Chunk the transcript into manageable pieces
Summarize each chunk independently
Generate structured report sections (Introduction, Body, Conclusion)
Compile a final polished summary as a PDF

The system chunks the transcript into manageable pieces as most LLMs do not have a context window large enough to handle a full podcast. A typical podcast can be well over 100k tokens, often exceeding the context limits of most open source models. This chunking strategy allows the system to process lengthy content that would otherwise be impossible to summarize in a single pass.

The system leverages checkpointing to ensure progress is preserved between runs, making it resilient to interruptions. The checkpoint also allows for the modification of prompts or changing of models to easily compare different approaches.

Key Components

Initialization and Configuration

The constructor sets up the working environment with sensible defaults:

Creates a uniquely named results directory
Configures the LLM parameters (using gpt-oss:20b by default)
Sets up file paths for artifacts
Validates YouTube API credentials

A flexible config() method allows runtime customization of model parameters:

def config(self,
          model_name = None, 
          temperature = None,
          num_cxt = None,
          raw_text_chunk_size = None,
          text_chunk_overlay_size = None):
    # Validates and applies configuration changes

These configuration parameters control important aspects of the system:

model_name: The name of an Ollama model to use (e.g., 'llama3.3:latest'). The model must already be pulled and available on the system.
temperature: Controls the amount of creativity or randomness in the model's responses. This is typically a real number between 0.0 (more deterministic) and 1.0 (more creative), though some models may allow temperature values above 1.
num_cxt: Establishes the size of the context window for the LLM, measured in tokens. This cannot exceed the model's predefined context limit, but making it smaller can save memory on systems that are constrained by available RAM/VRAM. Adjusting this parameter allows for balancing between processing capacity and resource utilization.
raw_text_chunk_size: Measured in bytes (approximately four bytes per token), this determines the size of each transcript chunk processed independently.
text_chunk_overlay_size: Also measured in bytes, this is the number of bytes at the end of each chunk that is overlapped with the beginning of the next chunk. This preserves contextual meaning that may be lost by abruptly cutting off the text at an arbitrary point.

Calculating Maximum Summary Response Size

Before processing text chunks, we need to calculate the max_summary_response_size (in bytes) to ensure our summaries fit within the model's context window.

This value represents the maximum allowable size for each individual chunk summary and is calculated as follows:

Convert the context window size (num_ctx) from tokens to bytes by multiplying by 4 bytes per token
Allocate 60% of the total context window for chunk summaries
Divide this allocated space by the number of chunks to be processed

Formula:

$$\text{maxSummaryResponseSize} = \frac {(\text{numCtx} \times 4) \times 60%} {\text{numberOfChunks}} $$

This calculation ensures the total size of all summaries won't exceed the available context window. The remaining 40% of the context window is reserved for:

Introduction (10%)
Conclusion (10%)
Instruction prompts to the LLM (10%)
Safety cushion (10%)

By maintaining this balance, we can efficiently aggregate all summaries in the final processing step while staying within context limits.

Content Extraction

The system extracts podcast content using YouTube APIs:

@checkpoint     
def _get_title_and_transcript(self):
    """ Pulls the details of the video from youtube. 
    This includes the video title, transcript text and a thumbnail."""
    transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
    thumbnail_file_path = f"{self.working_dir}/{THUMBNAIL_FILE}"
    self.youtube_client.get_transcript(self.video_id, transcript_file_path)
    self.youtube_client.download_thumbnail(self.video_id, thumbnail_file_path)

To use this feature, you'll need a YouTube API key, which can be obtained from the Google Developers Console: https://developers.google.com/youtube/v3/getting-started

Transcript Processing

Lengthy transcripts are intelligently chunked to fit within model context windows:

def _chunk_transcript(self):
    """Simplifies the call to chunk_text because we already know all the parameters"""
    transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
    return self.youtube_client.chunk_text(transcript_file_path, self.raw_text_chunk_size, self.text_chunk_overlay_size)

Prompt Engineering and transcript processing

Large Language Models are extremely sensitive to instructions, context, and formatting. The PodcastSummary pipeline uses specialized system and instruction prompts to guide the model through a multi-stage summarization process. Rather than one giant prompt, the system progressively refines and transforms the podcast transcript into structured, publishable content.

This pipeline demonstrates three advanced prompt engineering techniques:

1. Role Assignment

Each stage explicitly tells the model what kind of expert it should act as:

“AI research assistant”
“professional summarizer”
“professional writer”

Assigning a role consistently improves factual tone, avoids hallucinations, and keeps the writing style stable across chunks even when text sources differ.

2. Instructional Scope

Each prompt is scoped to a specific phase:

summarize one chunk
integrate multiple chunks
create introduction
create conclusion
merge the final report

Each instruction tells the model exactly what this step is responsible for and what not to do, preventing drift between iterative calls.

3. Hierarchical Prompting

The system uses a layered prompt architecture:

Level	Purpose
System Prompts	tone, persona, format
Instruction Prompts	step-by-step task guidance
Text Context	the actual transcript
Output	the final structured result

This architecture lets the system build understanding in stages—chunks become summaries, summaries become sections, and sections become a polished final report.

Example: Chunk Summarization Prompt

SUMMARIZE_CHUNK_PROMPT = (
  "As a professional summarizer, create a detailed summary..."
  "Use the == Title == ..."
  "not exceeding {max_summary_response_size} bytes..."
)

Why it works:

size control
objective tone
avoids hallucination
stable formatting
consistent across all chunks

Example: Report Section Prompt

CREATE_REPORT_BODY_PROMPT = (
  "Integrate the provided ==SubContext== into a unified body..."
  "Do NOT include introduction or conclusion..."
  "Organize the material into ### topic headers..."
)

This prompt explicitly forces:

topic hierarchy
structured headings
de-duplication of ideas
academic tone
logical flow

Final Stage Polishing

FINAL_REPORT_SYSTEM_PROMPT = (
  "You are a professional writer..."
  "You apply APA formatting..."
)

This stage focuses on refinement:

voice alignment
academic writing
consistent formatting
polished publication-ready document

Why Prompt Engineering Matters in This Project

Traditional summarization fails because:

podcasts exceed context limits
long-form dialogue lacks structure
topic shifts occur frequently
transcripts contain noise

This system applies:

segmentation
scoped prompting
role consistency
hierarchical refinement
model-driven section synthesis

The result is a factual, well-structured, academically styled, coherent, and professionally formatted document.

All running locally.

What Users Learn About Prompt Engineering

Readers will understand that:

prompts = programmatic instructions
prompts define model behavior
LLM pipelines need staged logic
rewriting prompts changes output
models follow explicit constraints
professional results require multiple refinement phases

This README teaches the fundamentals of:

prompt structure
multi-stage prompting
context control
LLM pipeline design

Technical Implementation Details

Several design patterns and technical approaches stand out:

Checkpoint decorators for resilience and resume capabilities:

@checkpoint
def _summarize_chunk(self, context: str, max_summary_response_size: int, chunk_index: int) -> str:
    # Function can resume from previous runs if interrupted

Smart resource management to avoid context window limitations:

max_summary_response_size = (self.num_cxt * 2)/len(chunks)

Timing utilities for performance monitoring:

def _elapsed_time(self, start_time, end_time = None):
    # Calculates and formats execution time

End-to-End Workflow

The entire process is orchestrated through a single method:

def create_summary_report(self):
    # 1. Get transcript and metadata
    self._get_title_and_transcript()
    
    # 2. Chunk the transcript
    chunks = self._chunk_transcript()
    
    # 3. Summarize each chunk
    self._summarize_chunks(chunks, max_summary_response_size)
    
    # 4. Concatenate summaries
    concatenated_content = self._read_and_concatenate_summaries()
    
    # 5. Generate structured sections
    introduction_text = self._introduction_text(concatenated_content)
    main_body_text = self._main_body_text(concatenated_content)
    conclusion_text = self._conclusion_text(concatenated_content)
    
    # 6. Create final report
    final_report_text = self._final_report_text(draft_report)
    
    # 7. Convert to PDF
    self._markdown_to_pdf(final_report_text)

Conclusion

This codebase demonstrates a straightforward approach to content summarization using locally-run LLMs. By breaking down a lengthy podcast into manageable chunks, summarizing each independently, and then recombining them into a cohesive document, it overcomes context window limitations while maintaining semantic coherence.

The implementation shows thoughtful design with error handling, progress tracking, and performance monitoring. The checkpoint system ensures that long-running processes can be resumed if interrupted, making this suitable for processing lengthy content like Lex Fridman's often multi-hour podcast episodes.

Project details

Release history Release notifications | RSS feed

0.1.6

May 18, 2026

0.1.5

May 18, 2026

This version

0.1.4

May 18, 2026

0.1.3

Dec 9, 2025

0.1.0

Dec 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sumtube-0.1.4.tar.gz (19.9 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sumtube-0.1.4-py3-none-any.whl (18.0 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file sumtube-0.1.4.tar.gz.

File metadata

Download URL: sumtube-0.1.4.tar.gz
Upload date: May 18, 2026
Size: 19.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sumtube-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`0fecc149158a3736e4c4eee5d57259bd1eb466de9f36978b07ab31200e8cfbb8`
MD5	`a74ea157f72f27b8f70f16634b30b445`
BLAKE2b-256	`67df467e50312ebb836b9352423b9c80676d73e347bf9764d90fecd9777cb22d`

See more details on using hashes here.

File details

Details for the file sumtube-0.1.4-py3-none-any.whl.

File metadata

Download URL: sumtube-0.1.4-py3-none-any.whl
Upload date: May 18, 2026
Size: 18.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sumtube-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2085859aed165f9a1598a1394a20d84e0b4087afcbb346ae4ac8f0a3b1be0e3c`
MD5	`d00d3cf3730d1719730dfa645709ab29`
BLAKE2b-256	`06ad095397c1cfb32a81300a307d05b846f443624eb53a8ccce2457ebfdd2350`

See more details on using hashes here.

sumtube 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

Overview of PodcastSummary: Automated Podcast Summarization Using Ollama

Core Architecture

Key Components

Initialization and Configuration

Calculating Maximum Summary Response Size

Content Extraction

Transcript Processing

Prompt Engineering and transcript processing

1. Role Assignment

2. Instructional Scope

3. Hierarchical Prompting

Example: Chunk Summarization Prompt

Example: Report Section Prompt

Final Stage Polishing

Why Prompt Engineering Matters in This Project

What Users Learn About Prompt Engineering

Technical Implementation Details

End-to-End Workflow

Conclusion

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes