Summarize YouTube videos
Project description
Overview of PodcastSummary: Automated Podcast Summarization Using Ollama
The PodcastSummary class represents a powerful tool for automatically generating structured summaries of podcasts using local LLM inference via Ollama. This text breaks down the architecture and workflow of this AI-powered summarization pipeline.
Core Architecture
At its heart, the class implements a multi-stage process:
- Extract podcast content from YouTube
- Chunk the transcript into manageable pieces
- Summarize each chunk independently
- Generate structured report sections (Introduction, Body, Conclusion)
- Compile a final polished summary as a PDF
The system chunks the transcript into manageable pieces as most LLMs do not have a context window large enough to handle a full podcast. A typical podcast can be well over 100k tokens, often exceeding the context limits of most open source models. This chunking strategy allows the system to process lengthy content that would otherwise be impossible to summarize in a single pass.
The system leverages checkpointing to ensure progress is preserved between runs, making it resilient to interruptions. The checkpoint also allows for the modification of prompts or changing of models to easily compare different approaches.
Key Components
Initialization and Configuration
The constructor sets up the working environment with sensible defaults:
- Creates a uniquely named results directory
- Configures the LLM parameters (using gpt-oss:20b by default)
- Sets up file paths for artifacts
- Validates YouTube API credentials
A flexible config() method allows runtime customization of model parameters:
def config(self,
model_name = None,
temperature = None,
num_cxt = None,
raw_text_chunk_size = None,
text_chunk_overlay_size = None):
# Validates and applies configuration changes
These configuration parameters control important aspects of the system:
-
model_name: The name of an Ollama model to use (e.g., 'llama3.3:latest'). The model must already be pulled and available on the system.
-
temperature: Controls the amount of creativity or randomness in the model's responses. This is typically a real number between 0.0 (more deterministic) and 1.0 (more creative), though some models may allow temperature values above 1.
-
num_cxt: Establishes the size of the context window for the LLM, measured in tokens. This cannot exceed the model's predefined context limit, but making it smaller can save memory on systems that are constrained by available RAM/VRAM. Adjusting this parameter allows for balancing between processing capacity and resource utilization.
-
raw_text_chunk_size: Measured in bytes (approximately four bytes per token), this determines the size of each transcript chunk processed independently.
-
text_chunk_overlay_size: Also measured in bytes, this is the number of bytes at the end of each chunk that is overlapped with the beginning of the next chunk. This preserves contextual meaning that may be lost by abruptly cutting off the text at an arbitrary point.
Calculating Maximum Summary Response Size
Before processing text chunks, we need to calculate the max_summary_response_size (in bytes) to ensure our summaries fit within the model's context window.
This value represents the maximum allowable size for each individual chunk summary and is calculated as follows:
- Convert the context window size (
num_ctx) from tokens to bytes by multiplying by 4 bytes per token - Allocate 60% of the total context window for chunk summaries
- Divide this allocated space by the number of chunks to be processed
Formula:
$$\text{maxSummaryResponseSize} = \frac {(\text{numCtx} \times 4) \times 60%} {\text{numberOfChunks}} $$
This calculation ensures the total size of all summaries won't exceed the available context window. The remaining 40% of the context window is reserved for:
- Introduction (10%)
- Conclusion (10%)
- Instruction prompts to the LLM (10%)
- Safety cushion (10%)
By maintaining this balance, we can efficiently aggregate all summaries in the final processing step while staying within context limits.
Content Extraction
The system extracts podcast content using YouTube APIs:
@checkpoint
def _get_title_and_transcript(self):
""" Pulls the details of the video from youtube.
This includes the video title, transcript text and a thumbnail."""
transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
thumbnail_file_path = f"{self.working_dir}/{THUMBNAIL_FILE}"
self.youtube_client.get_transcript(self.video_id, transcript_file_path)
self.youtube_client.download_thumbnail(self.video_id, thumbnail_file_path)
To use this feature, you'll need a YouTube API key, which can be obtained from the Google Developers Console: https://developers.google.com/youtube/v3/getting-started
Transcript Processing
Lengthy transcripts are intelligently chunked to fit within model context windows:
def _chunk_transcript(self):
"""Simplifies the call to chunk_text because we already know all the parameters"""
transcript_file_path = f"{self.working_dir}/{TRANSCRIPT_FILE}"
return self.youtube_client.chunk_text(transcript_file_path, self.raw_text_chunk_size, self.text_chunk_overlay_size)
Prompt Engineering and transcript processing
Large Language Models are extremely sensitive to instructions, context, and formatting. The PodcastSummary pipeline uses specialized system and instruction prompts to guide the model through a multi-stage summarization process. Rather than one giant prompt, the system progressively refines and transforms the podcast transcript into structured, publishable content.
This pipeline demonstrates three advanced prompt engineering techniques:
1. Role Assignment
Each stage explicitly tells the model what kind of expert it should act as:
- “AI research assistant”
- “professional summarizer”
- “professional writer”
Assigning a role consistently improves factual tone, avoids hallucinations, and keeps the writing style stable across chunks even when text sources differ.
2. Instructional Scope
Each prompt is scoped to a specific phase:
- summarize one chunk
- integrate multiple chunks
- create introduction
- create conclusion
- merge the final report
Each instruction tells the model exactly what this step is responsible for and what not to do, preventing drift between iterative calls.
3. Hierarchical Prompting
The system uses a layered prompt architecture:
| Level | Purpose |
|---|---|
| System Prompts | tone, persona, format |
| Instruction Prompts | step-by-step task guidance |
| Text Context | the actual transcript |
| Output | the final structured result |
This architecture lets the system build understanding in stages—chunks become summaries, summaries become sections, and sections become a polished final report.
Example: Chunk Summarization Prompt
SUMMARIZE_CHUNK_PROMPT = (
"As a professional summarizer, create a detailed summary..."
"Use the == Title == ..."
"not exceeding {max_summary_response_size} bytes..."
)
Why it works:
- size control
- objective tone
- avoids hallucination
- stable formatting
- consistent across all chunks
Example: Report Section Prompt
CREATE_REPORT_BODY_PROMPT = (
"Integrate the provided ==SubContext== into a unified body..."
"Do NOT include introduction or conclusion..."
"Organize the material into ### topic headers..."
)
This prompt explicitly forces:
- topic hierarchy
- structured headings
- de-duplication of ideas
- academic tone
- logical flow
Final Stage Polishing
FINAL_REPORT_SYSTEM_PROMPT = (
"You are a professional writer..."
"You apply APA formatting..."
)
This stage focuses on refinement:
- voice alignment
- academic writing
- consistent formatting
- polished publication-ready document
Why Prompt Engineering Matters in This Project
Traditional summarization fails because:
- podcasts exceed context limits
- long-form dialogue lacks structure
- topic shifts occur frequently
- transcripts contain noise
This system applies:
- segmentation
- scoped prompting
- role consistency
- hierarchical refinement
- model-driven section synthesis
The result is a factual, well-structured, academically styled, coherent, and professionally formatted document.
All running locally.
What Users Learn About Prompt Engineering
Readers will understand that:
- prompts = programmatic instructions
- prompts define model behavior
- LLM pipelines need staged logic
- rewriting prompts changes output
- models follow explicit constraints
- professional results require multiple refinement phases
This README teaches the fundamentals of:
- prompt structure
- multi-stage prompting
- context control
- LLM pipeline design
Technical Implementation Details
Several design patterns and technical approaches stand out:
- Checkpoint decorators for resilience and resume capabilities:
@checkpoint
def _summarize_chunk(self, context: str, max_summary_response_size: int, chunk_index: int) -> str:
# Function can resume from previous runs if interrupted
- Smart resource management to avoid context window limitations:
max_summary_response_size = (self.num_cxt * 2)/len(chunks)
- Timing utilities for performance monitoring:
def _elapsed_time(self, start_time, end_time = None):
# Calculates and formats execution time
End-to-End Workflow
The entire process is orchestrated through a single method:
def create_summary_report(self):
# 1. Get transcript and metadata
self._get_title_and_transcript()
# 2. Chunk the transcript
chunks = self._chunk_transcript()
# 3. Summarize each chunk
self._summarize_chunks(chunks, max_summary_response_size)
# 4. Concatenate summaries
concatenated_content = self._read_and_concatenate_summaries()
# 5. Generate structured sections
introduction_text = self._introduction_text(concatenated_content)
main_body_text = self._main_body_text(concatenated_content)
conclusion_text = self._conclusion_text(concatenated_content)
# 6. Create final report
final_report_text = self._final_report_text(draft_report)
# 7. Convert to PDF
self._markdown_to_pdf(final_report_text)
Conclusion
This codebase demonstrates a straightforward approach to content summarization using locally-run LLMs. By breaking down a lengthy podcast into manageable chunks, summarizing each independently, and then recombining them into a cohesive document, it overcomes context window limitations while maintaining semantic coherence.
The implementation shows thoughtful design with error handling, progress tracking, and performance monitoring. The checkpoint system ensures that long-running processes can be resumed if interrupted, making this suitable for processing lengthy content like Lex Fridman's often multi-hour podcast episodes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sumtube-0.1.0.tar.gz.
File metadata
- Download URL: sumtube-0.1.0.tar.gz
- Upload date:
- Size: 19.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12030db64d78e11db9a7527f8146749c389963349aa94e8c8359da5f8c35d388
|
|
| MD5 |
b7c273a0092b1254b2a4ca3702d35c75
|
|
| BLAKE2b-256 |
4ed30602a1b953a047204695d9737b3d04cf29b2a1ad43511ebab014c7aa977c
|
File details
Details for the file sumtube-0.1.0-py3-none-any.whl.
File metadata
- Download URL: sumtube-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
065c70a01985d5ab1591fa53cea59a5bb6f781b77938507ff773ef418aaea459
|
|
| MD5 |
aa9ef1239c5f8b647d05f027513c7419
|
|
| BLAKE2b-256 |
301de4f178fa2323f2d06bf7d4e408828b48c4d0e1738c9bba8f72259417501a
|