Skip to main content

Summarize YouTube videos

Project description

Overview of PodcastSummary: Automated Podcast Summarization Using Ollama

The PodcastSummary class represents a powerful tool for automatically generating structured summaries of podcasts using local LLM inference via Ollama. This text breaks down the architecture and workflow of this AI-powered summarization pipeline.

Core Architecture

At its heart, the class implements a multi-stage process:

  1. Extract podcast content from YouTube
  2. Chunk the transcript into manageable pieces
  3. Summarize each chunk independently
  4. Generate structured report sections (Introduction, Body, Conclusion)
  5. Compile a final polished summary as a PDF

The system chunks the transcript into manageable pieces as most LLMs do not have a context window large enough to handle a full podcast. A typical podcast can be well over 100k tokens, often exceeding the context limits of most open source models. This chunking strategy allows the system to process lengthy content that would otherwise be impossible to summarize in a single pass.

The system leverages checkpointing to ensure progress is preserved between runs, making it resilient to interruptions. The checkpoint also allows for the modification of prompts or changing of models to easily compare different approaches.

Key Components

Initialization and Configuration

The constructor sets up the working environment with sensible defaults:

  • Creates a uniquely named results directory
  • Configures the LLM parameters (using gpt-oss:20b by default)
  • Sets up file paths for artifacts
  • Validates YouTube API credentials

A flexible config() method allows runtime customization of model parameters:

def config(self,
          model_name = None, 
          temperature = None,
          num_cxt = None,
          raw_text_chunk_size = None,
          text_chunk_overlay_size = None):
    # Validates and applies configuration changes

These configuration parameters control important aspects of the system:

  • model_name: The name of an Ollama model to use (e.g., 'llama3.3:latest'). The model must already be pulled and available on the system.

  • temperature: Controls the amount of creativity or randomness in the model's responses. This is typically a real number between 0.0 (more deterministic) and 1.0 (more creative), though some models may allow temperature values above 1.

  • num_cxt: Establishes the size of the context window for the LLM, measured in tokens. This cannot exceed the model's predefined context limit, but making it smaller can save memory on systems that are constrained by available RAM/VRAM. Adjusting this parameter allows for balancing between processing capacity and resource utilization.

  • raw_text_chunk_size: Measured in bytes (approximately four bytes per token), this determines the size of each transcript chunk processed independently.

  • text_chunk_overlay_size: Also measured in bytes, this is the number of bytes at the end of each chunk that is overlapped with the beginning of the next chunk. This preserves contextual meaning that may be lost by abruptly cutting off the text at an arbitrary point.

Calculating Maximum Summary Response Size

Before processing text chunks, we need to calculate the max_summary_response_size (in bytes) to ensure our summaries fit within the model's context window.

This value represents the maximum allowable size for each individual chunk summary and is calculated as follows:

  1. Convert the context window size (num_ctx) from tokens to bytes by multiplying by 4 bytes per token
  2. Allocate 60% of the total context window for chunk summaries
  3. Divide this allocated space by the number of chunks to be processed

Formula:

$$\text{maxSummaryResponseSize} = \frac {(\text{numCtx} \times 4) \times 60%} {\text{numberOfChunks}} $$

This calculation ensures the total size of all summaries won't exceed the available context window. The remaining 40% of the context window is reserved for:

  • Introduction (10%)
  • Conclusion (10%)
  • Instruction prompts to the LLM (10%)
  • Safety cushion (10%)

By maintaining this balance, we can efficiently aggregate all summaries in the final processing step while staying within context limits.

Content Extraction

The system extracts podcast content using YouTube APIs:

@checkpoint     
def _get_title_and_transcript(self):
    """ Pulls the details of the video from youtube. 
    This includes the video title, transcript text and a thumbnail."""
    transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
    thumbnail_file_path = f"{self.working_dir}/{THUMBNAIL_FILE}"
    self.youtube_client.get_transcript(self.video_id, transcript_file_path)
    self.youtube_client.download_thumbnail(self.video_id, thumbnail_file_path)

To use this feature, you'll need a YouTube API key, which can be obtained from the Google Developers Console: https://developers.google.com/youtube/v3/getting-started

Transcript Processing

Lengthy transcripts are intelligently chunked to fit within model context windows:

def _chunk_transcript(self):
    """Simplifies the call to chunk_text because we already know all the parameters"""
    transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
    return self.youtube_client.chunk_text(transcript_file_path, self.raw_text_chunk_size, self.text_chunk_overlay_size)

Prompt Engineering and transcript processing

Large Language Models are extremely sensitive to instructions, context, and formatting. The PodcastSummary pipeline uses specialized system and instruction prompts to guide the model through a multi-stage summarization process. Rather than one giant prompt, the system progressively refines and transforms the podcast transcript into structured, publishable content.

This pipeline demonstrates three advanced prompt engineering techniques:

1. Role Assignment

Each stage explicitly tells the model what kind of expert it should act as:

  • “AI research assistant”
  • “professional summarizer”
  • “professional writer”

Assigning a role consistently improves factual tone, avoids hallucinations, and keeps the writing style stable across chunks even when text sources differ.

2. Instructional Scope

Each prompt is scoped to a specific phase:

  • summarize one chunk
  • integrate multiple chunks
  • create introduction
  • create conclusion
  • merge the final report

Each instruction tells the model exactly what this step is responsible for and what not to do, preventing drift between iterative calls.

3. Hierarchical Prompting

The system uses a layered prompt architecture:

Level Purpose
System Prompts tone, persona, format
Instruction Prompts step-by-step task guidance
Text Context the actual transcript
Output the final structured result

This architecture lets the system build understanding in stages—chunks become summaries, summaries become sections, and sections become a polished final report.


Example: Chunk Summarization Prompt

SUMMARIZE_CHUNK_PROMPT = (
  "As a professional summarizer, create a detailed summary..."
  "Use the == Title == ..."
  "not exceeding {max_summary_response_size} bytes..."
)

Why it works:

  • size control
  • objective tone
  • avoids hallucination
  • stable formatting
  • consistent across all chunks

Example: Report Section Prompt

CREATE_REPORT_BODY_PROMPT = (
  "Integrate the provided ==SubContext== into a unified body..."
  "Do NOT include introduction or conclusion..."
  "Organize the material into ### topic headers..."
)

This prompt explicitly forces:

  • topic hierarchy
  • structured headings
  • de-duplication of ideas
  • academic tone
  • logical flow

Final Stage Polishing

FINAL_REPORT_SYSTEM_PROMPT = (
  "You are a professional writer..."
  "You apply APA formatting..."
)

This stage focuses on refinement:

  • voice alignment
  • academic writing
  • consistent formatting
  • polished publication-ready document

Why Prompt Engineering Matters in This Project

Traditional summarization fails because:

  • podcasts exceed context limits
  • long-form dialogue lacks structure
  • topic shifts occur frequently
  • transcripts contain noise

This system applies:

  • segmentation
  • scoped prompting
  • role consistency
  • hierarchical refinement
  • model-driven section synthesis

The result is a factual, well-structured, academically styled, coherent, and professionally formatted document.

All running locally.


What Users Learn About Prompt Engineering

Readers will understand that:

  • prompts = programmatic instructions
  • prompts define model behavior
  • LLM pipelines need staged logic
  • rewriting prompts changes output
  • models follow explicit constraints
  • professional results require multiple refinement phases

This README teaches the fundamentals of:

  • prompt structure
  • multi-stage prompting
  • context control
  • LLM pipeline design

Technical Implementation Details

Several design patterns and technical approaches stand out:

  1. Checkpoint decorators for resilience and resume capabilities:
@checkpoint
def _summarize_chunk(self, context: str, max_summary_response_size: int, chunk_index: int) -> str:
    # Function can resume from previous runs if interrupted
  1. Smart resource management to avoid context window limitations:
max_summary_response_size = (self.num_cxt * 2)/len(chunks)
  1. Timing utilities for performance monitoring:
def _elapsed_time(self, start_time, end_time = None):
    # Calculates and formats execution time

End-to-End Workflow

The entire process is orchestrated through a single method:

def create_summary_report(self):
    # 1. Get transcript and metadata
    self._get_title_and_transcript()
    
    # 2. Chunk the transcript
    chunks = self._chunk_transcript()
    
    # 3. Summarize each chunk
    self._summarize_chunks(chunks, max_summary_response_size)
    
    # 4. Concatenate summaries
    concatenated_content = self._read_and_concatenate_summaries()
    
    # 5. Generate structured sections
    introduction_text = self._introduction_text(concatenated_content)
    main_body_text = self._main_body_text(concatenated_content)
    conclusion_text = self._conclusion_text(concatenated_content)
    
    # 6. Create final report
    final_report_text = self._final_report_text(draft_report)
    
    # 7. Convert to PDF
    self._markdown_to_pdf(final_report_text)

Conclusion

This codebase demonstrates a straightforward approach to content summarization using locally-run LLMs. By breaking down a lengthy podcast into manageable chunks, summarizing each independently, and then recombining them into a cohesive document, it overcomes context window limitations while maintaining semantic coherence.

The implementation shows thoughtful design with error handling, progress tracking, and performance monitoring. The checkpoint system ensures that long-running processes can be resumed if interrupted, making this suitable for processing lengthy content like Lex Fridman's often multi-hour podcast episodes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sumtube-0.1.4.tar.gz (19.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sumtube-0.1.4-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file sumtube-0.1.4.tar.gz.

File metadata

  • Download URL: sumtube-0.1.4.tar.gz
  • Upload date:
  • Size: 19.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sumtube-0.1.4.tar.gz
Algorithm Hash digest
SHA256 0fecc149158a3736e4c4eee5d57259bd1eb466de9f36978b07ab31200e8cfbb8
MD5 a74ea157f72f27b8f70f16634b30b445
BLAKE2b-256 67df467e50312ebb836b9352423b9c80676d73e347bf9764d90fecd9777cb22d

See more details on using hashes here.

File details

Details for the file sumtube-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: sumtube-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 18.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sumtube-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2085859aed165f9a1598a1394a20d84e0b4087afcbb346ae4ac8f0a3b1be0e3c
MD5 d00d3cf3730d1719730dfa645709ab29
BLAKE2b-256 06ad095397c1cfb32a81300a307d05b846f443624eb53a8ccce2457ebfdd2350

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page