Skip to main content

Transcribe any YouTube video into a structural Markdown document

Project description

yt2doc

Header Image

yt2doc transcribes videos online into readable Markdown documents.

Supported video sources:

  • YouTube
  • Apple Podcast
  • Twitter

yt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with Ollama.

Check out some examples generated by yt2doc.

Why

There have been many existing projects that transcribe YouTube videos with Whisper and its variants, but most of them aimed to generate subtitles, while I had not found one that priortises readability. Whisper does not generate line break in its transcription, so transcribing a 20 mins long video without any post processing would give you a huge piece of text, without any line break or topic segmentation. This project aims to transcribe videos with that post processing.

Installation

Prerequisites

ffmepg is required to run yt2doc. If you are on

If you are running MacOS:

brew install ffmpeg

If you are on Debian/Ubuntu:

sudo apt install ffmpeg

Install yt2doc

Install with pipx:

pipx install yt2doc

Or install with uv:

uv tool install yt2doc

Usage

Get helping information:

yt2doc --help

Transcribe Video from Youtube or Twitter

To transcribe a video (on YouTube or Twitter) into a document:

yt2doc --video <video-url>

To save your transcription:

yt2doc --video <video-url> -o some_dir/transcription.md

Transcribe a YouTube playlist

To transcribe all videos from a YouTube playlist:

yt2doc --playlist <playlist-url> -o some_dir

Chapter unchaptered videos

(Ollama Required) If the video is not chaptered, you can chapter it and add headings to each chapter:

yt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>

Among smaller size models, gemma2:9b, llama3.1:8b, and qwen 2.5:7b work reasonably well.

Transcribe Apple Podcast

To transcribe a podcast episode on Apple Podcast:

yt2doc --audio <apple-podcast-episode-url> --segment-unchaptered --llm-model <model-name>

Whisper configuration

By default, yt2doc uses faster-whisper as transcription backend. You can run yt2doc with different faster-whisper configs (model size, device, compute type etc):

yt2doc --video <video-url> --whisper-model <model-name> --whisper-device <cpu|cuda|auto> --whisper-compute-type <compute_type>

For the meaning and choices of --whisper-model, --whisper-device and --whisper-compute-type, please refer to this comment of faster-whisper.

If you are running yt2doc on Apple Silicon, whisper.cpp gives much faster performance as it supports the Apple GPU. (A hacky) Support for whisper.cpp has been implemented:

yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable>  --whisper-cpp-model <path-to-whisper-cpp-model>

See https://github.com/shun-liang/yt2doc/issues/15 for more info on whisper.cpp integration.

Text segmentation configuration

yt2doc uses Segment Any Text (SaT) to segment the transcript into sentences and paragraphs. You can change the SaT model:

yt2doc --video <video-url> --sat-model <sat-model>

List of available SaT models here.

TODOs

  • Tests and evaluation
  • Better whisper prompting strategy (right now hugely depend on the title and the description of the video).
  • Better support for non-English languages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt2doc-0.2.0.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

yt2doc-0.2.0-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file yt2doc-0.2.0.tar.gz.

File metadata

  • Download URL: yt2doc-0.2.0.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for yt2doc-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f6296ba889255ad64e82ce3ca93185fc47f257cb25e20700b445e940f5f59ded
MD5 53ad5e34867378a53ed12acc6f6f9525
BLAKE2b-256 7c581b1ea31fba9080ed852a2db48e3e1c47228885bf6db1bb8a9b507f173495

See more details on using hashes here.

File details

Details for the file yt2doc-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: yt2doc-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for yt2doc-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 98aaccf31f2b2e2920f9fd713875d5bf336a96cfdb7c66d36d2991a2c84943f1
MD5 ab7f231b323b28a34a9928f30665d9d4
BLAKE2b-256 9f47c1b2d69fd2f4e6930ed860b2834f5502209dcfc0af9cc61536a6bf8839e2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page