Skip to main content

Convert YouTube videos into structured markdown instruction documents

Project description

yt-instruct

Convert YouTube videos into structured markdown instruction documents.

Downloads audio via yt-dlp, transcribes with Mistral's voxtral API, then generates a clean how-to document using Claude.

Quick Start

# Run with uvx (no install needed)
uvx --from . yt-instruct https://www.youtube.com/watch?v=<id>

# Or install
pip install -e .
yt-instruct https://www.youtube.com/watch?v=<id>

Requirements

  • ffmpegbrew install ffmpeg or apt install ffmpeg
  • MISTRAL_API_KEYconsole.mistral.ai
  • ANTHROPIC_API_KEY — for default backend
  • NVIDIA_API_KEY — only for --backend nvidia

Usage

yt-instruct [OPTIONS] URL [URL...]
yt-instruct [OPTIONS] --url-file urls.txt
yt-instruct [OPTIONS] --transcript-file transcript.txt --title "Name"
yt-instruct [OPTIONS] --audio-file recording.mp3 --title "Name"

Options:
  --output-dir PATH              Output directory [default: .]
  --keep                         Keep intermediate audio + transcript files
  --merge                        Merge all videos into one document
  --resume                       Skip already-generated outputs; reuse cached transcripts
  --no-generate                  Stop after transcription; skip LLM generation
  --content-type [tutorial|lecture|ib|auto]
                                 Prompt style [default: auto]
  --backend [anthropic|llm|nvidia]
                                 LLM backend [default: anthropic]
  --model TEXT                   Model name [default: claude-sonnet-4-6]
  --prompt-file PATH             Custom system prompt (overrides built-in)
  --language LANG                Output language (e.g. 'French'). Defaults to English.
  --transcript-file PATH         Use existing transcript; skips download and transcription
  --audio-file PATH              Use existing audio file; skips download, transcribes directly
  --title TEXT                   Video title for --transcript-file or --audio-file
  --draft                        Set draft: true in the output frontmatter [default: false]
  --mistral-model TEXT           [default: voxtral-mini-latest]
  --audio-format [mp3|m4a]       [default: mp3]
  --version                      Show version and exit

Output Frontmatter

Every generated file includes YAML frontmatter:

---
title: "Video Title"
url: https://youtu.be/...
description: "YouTube video description"
date: 2026-04-12
draft: false
---

Use --draft to set draft: true (useful for Hugo, Jekyll, or similar static site generators). Merged documents (--merge) do not include frontmatter.

Content Types

Type Use for
auto Let the LLM detect (default)
tutorial How-to / step-by-step videos
lecture Tech talks, academic presentations
ib IB student subject videos

Custom Prompts

Override the built-in prompt with your own file. Template variables: {title}, {channel}, {content_type}, {duration}

yt-instruct <url> --prompt-file my_prompt.md

Using the llm backend

pip install llm llm-anthropic
llm keys set anthropic
yt-instruct <url> --backend llm --model claude-sonnet-4-6

Using the nvidia backend

NVIDIA_API_KEY=... yt-instruct <url> --backend nvidia --model moonshotai/kimi-k2-instruct

Batch Processing

# Multiple URLs
yt-instruct url1 url2 url3 --output-dir ./docs

# Playlist (automatically expanded)
yt-instruct https://www.youtube.com/playlist?list=<id> --output-dir ./docs

# From file
cat urls.txt | yt-instruct --url-file /dev/stdin

# Merge all into one doc
yt-instruct url1 url2 --merge --output-dir ./docs

Skip Steps — Use Existing Files

--audio-file and --transcript-file resolve relative to --output-dir if the file isn't found at the given path. This lets you reference files already in the output directory without typing the full path:

# Start from an existing transcript (skips download + transcription)
yt-instruct --transcript-file transcript.txt --title "My Video" --output-dir ./docs

# File not found locally? Looked up in ./docs automatically
yt-instruct --transcript-file my_transcript.txt --output-dir ./docs

# Start from an existing audio file (skips download, still transcribes)
yt-instruct --audio-file recording.mp3 --output-dir ./docs

Resume an Interrupted Run

Use --keep to save transcripts alongside output files, then --resume to continue from where a previous run stopped:

# First run (interrupted partway through)
yt-instruct --url-file urls.txt --keep --output-dir ./docs

# Resume — skips videos with existing output; reuses cached transcripts
yt-instruct --url-file urls.txt --resume --output-dir ./docs

--resume checks at two levels per video:

  1. Output .md already exists → skip entirely
  2. Cached *_transcript.txt exists (saved by --keep) → skip download and transcription, regenerate only

Changelog

See CHANGELOG.md for release history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_instruct-1.2.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yt_instruct-1.2.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file yt_instruct-1.2.0.tar.gz.

File metadata

  • Download URL: yt_instruct-1.2.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yt_instruct-1.2.0.tar.gz
Algorithm Hash digest
SHA256 e67fb81c7da41eb1d93b0900abbc2795ac1a11902c9f54e93ac807466ab4fa16
MD5 f3bdf373fbd4b9ae8ec0e21fba089b70
BLAKE2b-256 9d9017529c1a29febd9f30ceac7b97c1882cb99e2d852c9155ac94c265fd8f99

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_instruct-1.2.0.tar.gz:

Publisher: publish.yml on divyavanmahajan/yt-instruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yt_instruct-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: yt_instruct-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yt_instruct-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f4b63918206a906511202bd306fb5974d2f6772039b8b40ce5caeef17f965b0
MD5 89630a766c4a2fdc6fc9fe7fe59441af
BLAKE2b-256 ad4c04691c9969e688fa4c741cd5dbce4ff8bc78752047cb102545e43259e35f

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_instruct-1.2.0-py3-none-any.whl:

Publisher: publish.yml on divyavanmahajan/yt-instruct

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page