Skip to main content

Convert YouTube videos into structured markdown instruction documents

Project description

yt-instruct

Convert YouTube videos into structured markdown instruction documents.

Downloads audio via yt-dlp, transcribes with Mistral's voxtral API, then generates a clean how-to document using Claude.

Quick Start

# Run with uvx (no install needed)
uvx --from . yt-instruct https://www.youtube.com/watch?v=<id>

# Or install
pip install -e .
yt-instruct https://www.youtube.com/watch?v=<id>

Requirements

  • ffmpegbrew install ffmpeg or apt install ffmpeg
  • MISTRAL_API_KEYconsole.mistral.ai
  • ANTHROPIC_API_KEY — for default backend
  • NVIDIA_API_KEY — only for --backend nvidia

Usage

yt-instruct [OPTIONS] URL [URL...]
yt-instruct [OPTIONS] --url-file urls.txt
yt-instruct [OPTIONS] --transcript-file transcript.txt --title "Name"
yt-instruct [OPTIONS] --audio-file recording.mp3 --title "Name"

Options:
  --output-dir PATH              Output directory [default: .]
  --keep                         Keep intermediate audio + transcript files
  --merge                        Merge all videos into one document
  --resume                       Skip already-generated outputs; reuse cached transcripts
  --content-type [tutorial|lecture|ib|auto]
                                 Prompt style [default: auto]
  --backend [anthropic|llm|nvidia]
                                 LLM backend [default: anthropic]
  --model TEXT                   Model name [default: claude-sonnet-4-6]
  --prompt-file PATH             Custom system prompt (overrides built-in)
  --language LANG                Output language (e.g. 'French'). Defaults to English.
  --transcript-file PATH         Use existing transcript; skips download and transcription
  --audio-file PATH              Use existing audio file; skips download, transcribes directly
  --title TEXT                   Video title for --transcript-file or --audio-file
  --draft                        Set draft: true in the output frontmatter [default: false]
  --mistral-model TEXT           [default: voxtral-mini-latest]
  --audio-format [mp3|m4a]       [default: mp3]
  --version                      Show version and exit

Output Frontmatter

Every generated file includes YAML frontmatter:

---
title: "Video Title"
url: https://youtu.be/...
description: "YouTube video description"
date: 2026-04-12
draft: false
---

Use --draft to set draft: true (useful for Hugo, Jekyll, or similar static site generators). Merged documents (--merge) do not include frontmatter.

Content Types

Type Use for
auto Let the LLM detect (default)
tutorial How-to / step-by-step videos
lecture Tech talks, academic presentations
ib IB student subject videos

Custom Prompts

Override the built-in prompt with your own file. Template variables: {title}, {channel}, {content_type}, {duration}

yt-instruct <url> --prompt-file my_prompt.md

Using the llm backend

pip install llm llm-anthropic
llm keys set anthropic
yt-instruct <url> --backend llm --model claude-sonnet-4-6

Using the nvidia backend

NVIDIA_API_KEY=... yt-instruct <url> --backend nvidia --model moonshotai/kimi-k2-instruct

Batch Processing

# Multiple URLs
yt-instruct url1 url2 url3 --output-dir ./docs

# Playlist (automatically expanded)
yt-instruct https://www.youtube.com/playlist?list=<id> --output-dir ./docs

# From file
cat urls.txt | yt-instruct --url-file /dev/stdin

# Merge all into one doc
yt-instruct url1 url2 --merge --output-dir ./docs

Skip Steps — Use Existing Files

--audio-file and --transcript-file resolve relative to --output-dir if the file isn't found at the given path. This lets you reference files already in the output directory without typing the full path:

# Start from an existing transcript (skips download + transcription)
yt-instruct --transcript-file transcript.txt --title "My Video" --output-dir ./docs

# File not found locally? Looked up in ./docs automatically
yt-instruct --transcript-file my_transcript.txt --output-dir ./docs

# Start from an existing audio file (skips download, still transcribes)
yt-instruct --audio-file recording.mp3 --output-dir ./docs

Resume an Interrupted Run

Use --keep to save transcripts alongside output files, then --resume to continue from where a previous run stopped:

# First run (interrupted partway through)
yt-instruct --url-file urls.txt --keep --output-dir ./docs

# Resume — skips videos with existing output; reuses cached transcripts
yt-instruct --url-file urls.txt --resume --output-dir ./docs

--resume checks at two levels per video:

  1. Output .md already exists → skip entirely
  2. Cached *_transcript.txt exists (saved by --keep) → skip download and transcription, regenerate only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_instruct-1.0.0.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_instruct-1.0.0-py3-none-any.whl (18.1 kB view details)

Uploaded Python 3

File details

Details for the file yt_instruct-1.0.0.tar.gz.

File metadata

  • Download URL: yt_instruct-1.0.0.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yt_instruct-1.0.0.tar.gz
Algorithm Hash digest
SHA256 bb1f0cdcabaee1ab85f46d7800348a8f2e02d094950f77e6c8e46f6e656bbfdd
MD5 8d99a3bed34370785fe2be5ce54a07fe
BLAKE2b-256 88acb472b6b42a2f2d43850f40127d37db960c7583b96537ef381b5d37abfcff

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_instruct-1.0.0.tar.gz:

Publisher: publish.yml on divyavanmahajan/yt-instruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yt_instruct-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: yt_instruct-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 18.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yt_instruct-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e784fe988bac839718d80f00bad4de99c81c581e93091d315552c268b9ce7603
MD5 5a98feb7dbc0de5493368a8826e44792
BLAKE2b-256 8cdfc76bd5c3a060ccc14f2f5b79d397059727043060c57121c9dc3151fee477

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_instruct-1.0.0-py3-none-any.whl:

Publisher: publish.yml on divyavanmahajan/yt-instruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page