Skip to main content

Convert YouTube videos into structured markdown instruction documents

Project description

yt-instruct

Convert YouTube videos into structured markdown instruction documents.

Downloads audio via yt-dlp, transcribes with Mistral's voxtral API, then generates a clean how-to document using Claude.

Quick Start

# Run with uvx (no install needed)
uvx --from . yt-instruct https://www.youtube.com/watch?v=<id>

# Or install
pip install -e .
yt-instruct https://www.youtube.com/watch?v=<id>

Requirements

  • ffmpegbrew install ffmpeg or apt install ffmpeg
  • MISTRAL_API_KEYconsole.mistral.ai
  • ANTHROPIC_API_KEY — for default backend
  • NVIDIA_API_KEY — only for --backend nvidia

Usage

yt-instruct [OPTIONS] URL [URL...]
yt-instruct [OPTIONS] --url-file urls.txt
yt-instruct [OPTIONS] --transcript-file transcript.txt --title "Name"
yt-instruct [OPTIONS] --audio-file recording.mp3 --title "Name"

Options:
  --output-dir PATH              Output directory [default: .]
  --keep                         Keep intermediate audio + transcript files
  --merge                        Merge all videos into one document
  --resume                       Skip already-generated outputs; reuse cached transcripts
  --content-type [tutorial|lecture|ib|auto]
                                 Prompt style [default: auto]
  --backend [anthropic|llm|nvidia]
                                 LLM backend [default: anthropic]
  --model TEXT                   Model name [default: claude-sonnet-4-6]
  --prompt-file PATH             Custom system prompt (overrides built-in)
  --language LANG                Output language (e.g. 'French'). Defaults to English.
  --transcript-file PATH         Use existing transcript; skips download and transcription
  --audio-file PATH              Use existing audio file; skips download, transcribes directly
  --title TEXT                   Video title for --transcript-file or --audio-file
  --draft                        Set draft: true in the output frontmatter [default: false]
  --mistral-model TEXT           [default: voxtral-mini-latest]
  --audio-format [mp3|m4a]       [default: mp3]
  --version                      Show version and exit

Output Frontmatter

Every generated file includes YAML frontmatter:

---
title: "Video Title"
url: https://youtu.be/...
description: "YouTube video description"
date: 2026-04-12
draft: false
---

Use --draft to set draft: true (useful for Hugo, Jekyll, or similar static site generators). Merged documents (--merge) do not include frontmatter.

Content Types

Type Use for
auto Let the LLM detect (default)
tutorial How-to / step-by-step videos
lecture Tech talks, academic presentations
ib IB student subject videos

Custom Prompts

Override the built-in prompt with your own file. Template variables: {title}, {channel}, {content_type}, {duration}

yt-instruct <url> --prompt-file my_prompt.md

Using the llm backend

pip install llm llm-anthropic
llm keys set anthropic
yt-instruct <url> --backend llm --model claude-sonnet-4-6

Using the nvidia backend

NVIDIA_API_KEY=... yt-instruct <url> --backend nvidia --model moonshotai/kimi-k2-instruct

Batch Processing

# Multiple URLs
yt-instruct url1 url2 url3 --output-dir ./docs

# Playlist (automatically expanded)
yt-instruct https://www.youtube.com/playlist?list=<id> --output-dir ./docs

# From file
cat urls.txt | yt-instruct --url-file /dev/stdin

# Merge all into one doc
yt-instruct url1 url2 --merge --output-dir ./docs

Skip Steps — Use Existing Files

--audio-file and --transcript-file resolve relative to --output-dir if the file isn't found at the given path. This lets you reference files already in the output directory without typing the full path:

# Start from an existing transcript (skips download + transcription)
yt-instruct --transcript-file transcript.txt --title "My Video" --output-dir ./docs

# File not found locally? Looked up in ./docs automatically
yt-instruct --transcript-file my_transcript.txt --output-dir ./docs

# Start from an existing audio file (skips download, still transcribes)
yt-instruct --audio-file recording.mp3 --output-dir ./docs

Resume an Interrupted Run

Use --keep to save transcripts alongside output files, then --resume to continue from where a previous run stopped:

# First run (interrupted partway through)
yt-instruct --url-file urls.txt --keep --output-dir ./docs

# Resume — skips videos with existing output; reuses cached transcripts
yt-instruct --url-file urls.txt --resume --output-dir ./docs

--resume checks at two levels per video:

  1. Output .md already exists → skip entirely
  2. Cached *_transcript.txt exists (saved by --keep) → skip download and transcription, regenerate only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_instruct-1.1.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_instruct-1.1.0-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file yt_instruct-1.1.0.tar.gz.

File metadata

  • Download URL: yt_instruct-1.1.0.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yt_instruct-1.1.0.tar.gz
Algorithm Hash digest
SHA256 b4b5ad864ba2d3260b9d1a32aa264a3b1b39d80dc1fba2883e43a89ab7ef2821
MD5 1ca0bc5bbd63d00de9bb412af6b11df9
BLAKE2b-256 aa7eadc531c8cd3405b1453d2a017282f5d7f11e1687a23d5fa266ca12a9421b

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_instruct-1.1.0.tar.gz:

Publisher: publish.yml on divyavanmahajan/yt-instruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yt_instruct-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: yt_instruct-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yt_instruct-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 de8c0b528d698aa09a67a279f521a7e13438599b55cf521e4aea5e743259bb65
MD5 2e76368fa46b5edbf280f95881830ca1
BLAKE2b-256 99e5195de166fa7f05987ea5b2927c26df13690e05f1078e6dfc9f8ed9b18eef

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_instruct-1.1.0-py3-none-any.whl:

Publisher: publish.yml on divyavanmahajan/yt-instruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page