Skip to main content

Transcribe any YouTube video into a structural Markdown document

Project description

yt2doc

yt2doc transcribes videos online into structural documents in Markdown format.

Supported video sources:

  • YouTube
  • Twitter

yt2doc is meant to work fully locally, without invoking any external API. The OpenAI SDK dependency is required solely to interact with Ollama.

Installation

Install with pipx:

pipx install yt2doc

Or install with uv:

uv tool install yt2doc

Usage

Get helping information:

yt2doc --help

To transcribe a video (on YouTube or Twitter) into a document:

yt2doc --video <video-url>

To save your transcription:

yt2doc --video <video-url> -o some_dir/transcription.md

To transcribe all videos in a Youtube playlist:

yt2doc --playlist <playlist-url> -o some_dir

(Ollama Required) If the video is not chaptered, you can chapter it and add headings to each chapter:

yt2doc --video <video-url> --segment-unchaptered --llm-model <model-name>

Among smaller size models, qwen 2.5 7b seems works best.

For MacOS devices running Apple Silicon, (a hacky) support for whisper.cpp is supported:

yt2doc --video --whisper-backend whisper_cpp --whisper-cpp-executable <path-to-whisper-cpp-executable>  --whisper-cpp-model <path-to-whisper-cpp-model>

yt2doc uses Segment Any Text (SaT) to segment the transcript into sentences and paragraphs. You can change the SaT model:

yt2doc --video <video-url> --sat-model <sat-model>

List of available SaT models here.

TODOs

  • Tests and evaluation
  • CICD
  • Better whisper prompting strategy (right now hugely depend on the title and the description of the video).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt2doc-0.1.2.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

yt2doc-0.1.2-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file yt2doc-0.1.2.tar.gz.

File metadata

  • Download URL: yt2doc-0.1.2.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.20

File hashes

Hashes for yt2doc-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b20e146cb1f37066bb156ed8934eeda1b6c77dcca974bb4c06c86f87690629ce
MD5 1b3fd62faad0dbf0997d4d834de0d6cd
BLAKE2b-256 fb3acb9884edcdf7ef537539c2902d714d9f9194c6aaf676f8a380a1cb570f8d

See more details on using hashes here.

File details

Details for the file yt2doc-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: yt2doc-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.20

File hashes

Hashes for yt2doc-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f1382ace6e1e1500438de1efde460b5791b83aa98884887f259b6b0360b837f3
MD5 f41c9fe3d33abd2773cd756d98aaf5af
BLAKE2b-256 f4f0504f20459d6a3caa3bed7fd82dd5031dc7a149cfc1904bd04ba8d46aaedf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page