Skip to main content

Summarize YouTube videos

Project description

Overview of PodcastSummary: Automated Podcast Summarization Using Ollama

The PodcastSummary class represents a powerful tool for automatically generating structured summaries of podcasts using local LLM inference via Ollama. This text breaks down the architecture and workflow of this AI-powered summarization pipeline.

Core Architecture

At its heart, the class implements a multi-stage process:

  1. Extract podcast content from YouTube
  2. Chunk the transcript into manageable pieces
  3. Summarize each chunk independently
  4. Generate structured report sections (Introduction, Body, Conclusion)
  5. Compile a final polished summary as a PDF

The system chunks the transcript into manageable pieces as most LLMs do not have a context window large enough to handle a full podcast. A typical podcast can be well over 100k tokens, often exceeding the context limits of most open source models. This chunking strategy allows the system to process lengthy content that would otherwise be impossible to summarize in a single pass.

The system leverages checkpointing to ensure progress is preserved between runs, making it resilient to interruptions. The checkpoint also allows for the modification of prompts or changing of models to easily compare different approaches.

Key Components

Initialization and Configuration

The constructor sets up the working environment with sensible defaults:

  • Creates a uniquely named results directory
  • Configures the LLM parameters (using gpt-oss:20b by default)
  • Sets up file paths for artifacts
  • Validates YouTube API credentials

A flexible config() method allows runtime customization of model parameters:

def config(self,
          model_name = None, 
          temperature = None,
          num_cxt = None,
          raw_text_chunk_size = None,
          text_chunk_overlay_size = None):
    # Validates and applies configuration changes

These configuration parameters control important aspects of the system:

  • model_name: The name of an Ollama model to use (e.g., 'llama3.3:latest'). The model must already be pulled and available on the system.

  • temperature: Controls the amount of creativity or randomness in the model's responses. This is typically a real number between 0.0 (more deterministic) and 1.0 (more creative), though some models may allow temperature values above 1.

  • num_cxt: Establishes the size of the context window for the LLM, measured in tokens. This cannot exceed the model's predefined context limit, but making it smaller can save memory on systems that are constrained by available RAM/VRAM. Adjusting this parameter allows for balancing between processing capacity and resource utilization.

  • raw_text_chunk_size: Measured in bytes (approximately four bytes per token), this determines the size of each transcript chunk processed independently.

  • text_chunk_overlay_size: Also measured in bytes, this is the number of bytes at the end of each chunk that is overlapped with the beginning of the next chunk. This preserves contextual meaning that may be lost by abruptly cutting off the text at an arbitrary point.

Calculating Maximum Summary Response Size

Before processing text chunks, we need to calculate the max_summary_response_size (in bytes) to ensure our summaries fit within the model's context window.

This value represents the maximum allowable size for each individual chunk summary and is calculated as follows:

  1. Convert the context window size (num_ctx) from tokens to bytes by multiplying by 4 bytes per token
  2. Allocate 60% of the total context window for chunk summaries
  3. Divide this allocated space by the number of chunks to be processed

Formula:

$$\text{maxSummaryResponseSize} = \frac {(\text{numCtx} \times 4) \times 60%} {\text{numberOfChunks}} $$

This calculation ensures the total size of all summaries won't exceed the available context window. The remaining 40% of the context window is reserved for:

  • Introduction (10%)
  • Conclusion (10%)
  • Instruction prompts to the LLM (10%)
  • Safety cushion (10%)

By maintaining this balance, we can efficiently aggregate all summaries in the final processing step while staying within context limits.

Content Extraction

The system extracts podcast content using YouTube APIs:

@checkpoint     
def _get_title_and_transcript(self):
    """ Pulls the details of the video from youtube. 
    This includes the video title, transcript text and a thumbnail."""
    transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
    thumbnail_file_path = f"{self.working_dir}/{THUMBNAIL_FILE}"
    self.youtube_client.get_transcript(self.video_id, transcript_file_path)
    self.youtube_client.download_thumbnail(self.video_id, thumbnail_file_path)

To use this feature, you'll need a YouTube API key, which can be obtained from the Google Developers Console: https://developers.google.com/youtube/v3/getting-started

Transcript Processing

Lengthy transcripts are intelligently chunked to fit within model context windows:

def _chunk_transcript(self):
    """Simplifies the call to chunk_text because we already know all the parameters"""
    transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
    return self.youtube_client.chunk_text(transcript_file_path, self.raw_text_chunk_size, self.text_chunk_overlay_size)

Prompt Engineering and transcript processing

Large Language Models are extremely sensitive to instructions, context, and formatting. The PodcastSummary pipeline uses specialized system and instruction prompts to guide the model through a multi-stage summarization process. Rather than one giant prompt, the system progressively refines and transforms the podcast transcript into structured, publishable content.

This pipeline demonstrates three advanced prompt engineering techniques:

1. Role Assignment

Each stage explicitly tells the model what kind of expert it should act as:

  • “AI research assistant”
  • “professional summarizer”
  • “professional writer”

Assigning a role consistently improves factual tone, avoids hallucinations, and keeps the writing style stable across chunks even when text sources differ.

2. Instructional Scope

Each prompt is scoped to a specific phase:

  • summarize one chunk
  • integrate multiple chunks
  • create introduction
  • create conclusion
  • merge the final report

Each instruction tells the model exactly what this step is responsible for and what not to do, preventing drift between iterative calls.

3. Hierarchical Prompting

The system uses a layered prompt architecture:

Level Purpose
System Prompts tone, persona, format
Instruction Prompts step-by-step task guidance
Text Context the actual transcript
Output the final structured result

This architecture lets the system build understanding in stages—chunks become summaries, summaries become sections, and sections become a polished final report.


Example: Chunk Summarization Prompt

SUMMARIZE_CHUNK_PROMPT = (
  "As a professional summarizer, create a detailed summary..."
  "Use the == Title == ..."
  "not exceeding {max_summary_response_size} bytes..."
)

Why it works:

  • size control
  • objective tone
  • avoids hallucination
  • stable formatting
  • consistent across all chunks

Example: Report Section Prompt

CREATE_REPORT_BODY_PROMPT = (
  "Integrate the provided ==SubContext== into a unified body..."
  "Do NOT include introduction or conclusion..."
  "Organize the material into ### topic headers..."
)

This prompt explicitly forces:

  • topic hierarchy
  • structured headings
  • de-duplication of ideas
  • academic tone
  • logical flow

Final Stage Polishing

FINAL_REPORT_SYSTEM_PROMPT = (
  "You are a professional writer..."
  "You apply APA formatting..."
)

This stage focuses on refinement:

  • voice alignment
  • academic writing
  • consistent formatting
  • polished publication-ready document

Why Prompt Engineering Matters in This Project

Traditional summarization fails because:

  • podcasts exceed context limits
  • long-form dialogue lacks structure
  • topic shifts occur frequently
  • transcripts contain noise

This system applies:

  • segmentation
  • scoped prompting
  • role consistency
  • hierarchical refinement
  • model-driven section synthesis

The result is a factual, well-structured, academically styled, coherent, and professionally formatted document.

All running locally.


What Users Learn About Prompt Engineering

Readers will understand that:

  • prompts = programmatic instructions
  • prompts define model behavior
  • LLM pipelines need staged logic
  • rewriting prompts changes output
  • models follow explicit constraints
  • professional results require multiple refinement phases

This README teaches the fundamentals of:

  • prompt structure
  • multi-stage prompting
  • context control
  • LLM pipeline design

Technical Implementation Details

Several design patterns and technical approaches stand out:

  1. Checkpoint decorators for resilience and resume capabilities:
@checkpoint
def _summarize_chunk(self, context: str, max_summary_response_size: int, chunk_index: int) -> str:
    # Function can resume from previous runs if interrupted
  1. Smart resource management to avoid context window limitations:
max_summary_response_size = (self.num_cxt * 2)/len(chunks)
  1. Timing utilities for performance monitoring:
def _elapsed_time(self, start_time, end_time = None):
    # Calculates and formats execution time

End-to-End Workflow

The entire process is orchestrated through a single method:

def create_summary_report(self):
    # 1. Get transcript and metadata
    self._get_title_and_transcript()
    
    # 2. Chunk the transcript
    chunks = self._chunk_transcript()
    
    # 3. Summarize each chunk
    self._summarize_chunks(chunks, max_summary_response_size)
    
    # 4. Concatenate summaries
    concatenated_content = self._read_and_concatenate_summaries()
    
    # 5. Generate structured sections
    introduction_text = self._introduction_text(concatenated_content)
    main_body_text = self._main_body_text(concatenated_content)
    conclusion_text = self._conclusion_text(concatenated_content)
    
    # 6. Create final report
    final_report_text = self._final_report_text(draft_report)
    
    # 7. Convert to PDF
    self._markdown_to_pdf(final_report_text)

Conclusion

This codebase demonstrates a straightforward approach to content summarization using locally-run LLMs. By breaking down a lengthy podcast into manageable chunks, summarizing each independently, and then recombining them into a cohesive document, it overcomes context window limitations while maintaining semantic coherence.

The implementation shows thoughtful design with error handling, progress tracking, and performance monitoring. The checkpoint system ensures that long-running processes can be resumed if interrupted, making this suitable for processing lengthy content like Lex Fridman's often multi-hour podcast episodes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sumtube-0.1.0.tar.gz (19.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sumtube-0.1.0-py3-none-any.whl (17.7 kB view details)

Uploaded Python 3

File details

Details for the file sumtube-0.1.0.tar.gz.

File metadata

  • Download URL: sumtube-0.1.0.tar.gz
  • Upload date:
  • Size: 19.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sumtube-0.1.0.tar.gz
Algorithm Hash digest
SHA256 12030db64d78e11db9a7527f8146749c389963349aa94e8c8359da5f8c35d388
MD5 b7c273a0092b1254b2a4ca3702d35c75
BLAKE2b-256 4ed30602a1b953a047204695d9737b3d04cf29b2a1ad43511ebab014c7aa977c

See more details on using hashes here.

File details

Details for the file sumtube-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: sumtube-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 17.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for sumtube-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 065c70a01985d5ab1591fa53cea59a5bb6f781b77938507ff773ef418aaea459
MD5 aa9ef1239c5f8b647d05f027513c7419
BLAKE2b-256 301de4f178fa2323f2d06bf7d4e408828b48c4d0e1738c9bba8f72259417501a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page