Transcribe any YouTube video into a structural Markdown document
Project description
yt2doc
yt2doc transcribes videos & audios online into readable Markdown documents.
Supported video/audio sources:
- YouTube
- Apple Podcast
yt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with a local LLM server such as Ollama.
Check out some examples generated by yt2doc.
Why
There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing.
Installation
Prerequisites
ffmepg is required to run yt2doc. If you are on
If you are running MacOS:
brew install ffmpeg
If you are on Debian/Ubuntu:
sudo apt install ffmpeg
Install yt2doc
Install with pipx:
pipx install yt2doc
Or install with uv:
uv tool install yt2doc
Usage
Get helping information:
yt2doc --help
Transcribe Video from Youtube or Twitter
To transcribe a video (on YouTube or Twitter) into a document:
yt2doc --video <video-url>
To save your transcription:
yt2doc --video <video-url> -o some_dir/transcription.md
Transcribe a YouTube playlist
To transcribe all videos from a YouTube playlist:
yt2doc --playlist <playlist-url> -o some_dir
Chapter unchaptered videos
(LLM server e.g. Ollama required) If the video is not chaptered, you can chapter it and add headings to each chapter:
yt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>
Among smaller size models, gemma2:9b
, llama3.1:8b
, and qwen 2.5:7b
work reasonably well.
By default, yt2doc talks to Ollama at http://localhost:11434/v1
to segment the text by topic. You can run yt2doc to interact with Ollama at a different address or port, a different (OpenAI-compatible) LLM server (e.g. vLLM, mistral.rs), or even OpenAI itself, by
yt2doc --video <video-url> --segment-unchaptered --llm-server <llm-server-url> --llm-api-key <llm-server-api-key> --llm-model <model-name>
Transcribe Apple Podcast
To transcribe a podcast episode on Apple Podcast:
yt2doc --audio <apple-podcast-episode-url> --segment-unchaptered --llm-model <model-name>
Whisper configuration
By default, yt2doc uses faster-whisper as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):
yt2doc --video <video-url> --whisper-model <model-name> --whisper-device <cpu|cuda|auto> --whisper-compute-type <compute_type>
For the meaning and choices of --whisper-model
, --whisper-device
and --whisper-compute-type
, please refer to this comment of faster-whisper.
If you are running yt2doc on Apple Silicon, whisper.cpp gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented:
yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable> --whisper-cpp-model <path-to-whisper-cpp-model>
See https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration.
Text segmentation configuration
yt2doc uses Segment Any Text (SaT) to segment the transcript into sentences and paragraphs. You can change the SaT model:
yt2doc --video <video-url> --sat-model <sat-model>
List of available SaT models here.
TODOs
- Tests and evaluation
- Better whisper prompting strategy (right now hugely depend on the title and the description of the video).
- Better support for non-English languages
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file yt2doc-0.2.1.tar.gz
.
File metadata
- Download URL: yt2doc-0.2.1.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b444dcbd6107dc56223aed2183fffe01eb5486371fae1dd967e9a6396e8c6377 |
|
MD5 | 8fbba009426fc8d0d095dec9214e87e3 |
|
BLAKE2b-256 | c3ddd0d214baf16c1696d491f02749d69239719c2c6b5271ddbf5fbeb0647514 |
File details
Details for the file yt2doc-0.2.1-py3-none-any.whl
.
File metadata
- Download URL: yt2doc-0.2.1-py3-none-any.whl
- Upload date:
- Size: 21.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f1e7a93997281a868962b1f76b463807826eee24427d224cae52cc84f59d5082 |
|
MD5 | 1320941388b9ad6c3b76fc889b528468 |
|
BLAKE2b-256 | c23ff1716191faebcb090bca0f7e3162e41df239729a7d53ce31ee3bec5b8349 |