A CLI tool to download, transcribe, and summarize YouTube videos.

These details have not been verified by PyPI

Project links

Homepage

Project description

mdlsum (Media Downloader & Summarizer)

mdlsum is a Python package designed to download, transcribe, and summarize media. Currently, it supports YouTube videos, with plans to expand to podcasts and other media formats. This tool aims to provide basic yet useful summaries, with future iterations planned to enhance its utility for personal and family use.

Features

Download YouTube Videos: Fetches the best audio quality available from YouTube videos.
Transcribe Audio: Utilizes Whisper to convert audio into text, including timestamps.
Summarize Transcriptions: Uses a language model to generate concise summaries formatted as a table of contents with timestamps.

Installation

To install mdlsum-cli, run:

pip install mdlsum-cli

Usage

Set Environment Variable:

Ensure that the OPENAI_API_KEY is set in your environment:

export OPENAI_API_KEY=your_openai_api_key_here

Running the Application

To use the application, simply provide a YouTube URL:

mdlsum "https://www.youtube.com/watch?v=example"

Technical Details

Overview

The mdlsum package combines several powerful tools to achieve its functionality:

yt-dlp: A versatile downloader used to fetch YouTube videos.
ffmpeg: Converts downloaded video audio into the desired format.
whisper.cpp: A lightweight and efficient implementation of OpenAI’s Whisper model, used for transcribing audio.
OpenAI’s GPT-3.5-turbo: Provides the summarization capabilities, transforming transcriptions into concise summaries.
Typer: Facilitates the creation of the command-line interface (CLI).

How it works

Downloading Videos

The process begins with the download.py module, where yt-dlp downloads the audio from a YouTube video. The downloaded audio is then converted to a 16-bit WAV file using ffmpeg. This step ensures that the audio is in a suitable format for transcription.

Transcribing Audio

Next, the transcribe.py module takes over. It uses whisper.cpp to transcribe the audio into text, ensuring that timestamps are included. This transcription process involves the following steps:

Checking and downloading the Whisper model if not already present.
Running the transcription using the Whisper model to generate a text file with timestamps.

Summarizing Transcriptions

The final step is handled by the summarize.py module. This module sends the transcribed text to OpenAI’s API to generate a summary. The API call includes instructions to format the summary as a table of contents with timestamps. This step ensures that the summary is easy to navigate and understand.

Technologies Used

yt-dlp: for downloading YouTube videos and other media
ffmpeg: for audio conversion
whisper.cpp: for transcription locally on your PC
OpenAI API: for summarization
Typer: for building the CLI

Building the Project

Development Process

The project was developed incrementally, starting with setting up the basic CLI structure using Typer. Each major feature (downloading, transcribing, summarizing) was implemented and tested separately before being integrated into the final application. The development process involved:

Setting up a virtual environment for dependency management.
Implementing and testing each feature in isolation.
Integrating the features into a cohesive CLI tool.
Packaging the project for distribution.

Acknowledgements

This project wouldn't have been possible without the incredible work of the following individuals and organizations:

OpenAI & Anthropic for the Whisper model and their language model APIs
Georgi Gerganov for his incredible work on whisper.cpp https://github.com/ggerganov/whisper.cpp
yt-dlp https://github.com/yt-dlp/yt-dlp
Typer https://github.com/tiangolo/typer

License

See LICENSE.

Future Plans

Expand support to include podcasts and other media formats
Improve summary quality and customization options
Add advanced features like specifying models and timestamps
Build a library of podcasts and YouTube video summaries that is searchable

This project is a work in progress, and I look forward to iterating on it to make it even more useful. Thank you for checking it out!

SidRT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.1

Jul 26, 2024

0.2.0

Jul 22, 2024

0.1.3

Jul 17, 2024

0.1.2

Jul 17, 2024

0.1.1

Jul 16, 2024

0.1.0

Jul 16, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdlsum-cli-0.2.1.tar.gz (9.6 kB view details)

Uploaded Jul 26, 2024 Source

Built Distribution

mdlsum_cli-0.2.1-py3-none-any.whl (9.0 kB view details)

Uploaded Jul 26, 2024 Python 3

File details

Details for the file mdlsum-cli-0.2.1.tar.gz.

File metadata

Download URL: mdlsum-cli-0.2.1.tar.gz
Upload date: Jul 26, 2024
Size: 9.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for mdlsum-cli-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`d36b5585dc502e25b6efee7cee02c77a1772ca6bb88e8f01372dbf33201dec9d`
MD5	`8112db52326a732d229d72f0abee756b`
BLAKE2b-256	`1cad87f35587fb14f7a449f3bc3208ca7dc01c9343936da960b50163acd117ec`

See more details on using hashes here.

File details

Details for the file mdlsum_cli-0.2.1-py3-none-any.whl.

File metadata

Download URL: mdlsum_cli-0.2.1-py3-none-any.whl
Upload date: Jul 26, 2024
Size: 9.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for mdlsum_cli-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba97853d7a3c127f56f7d875d4777c9bc4862b70dfee034bf695e1392d35fb14`
MD5	`52b15f9e51f4eff03928005940714a3f`
BLAKE2b-256	`e7266a41ceb73c9575b090cca952b43e4bb33318a0ac0496d7192153d84a6a16`