Convert YouTube videos into structured markdown instruction documents
Project description
yt-instruct
Convert YouTube videos into structured markdown instruction documents.
Downloads audio via yt-dlp, transcribes with Mistral's voxtral API, then generates a clean how-to document using Claude.
Quick Start
# Run with uvx (no install needed)
uvx --from . yt-instruct https://www.youtube.com/watch?v=<id>
# Or install
pip install -e .
yt-instruct https://www.youtube.com/watch?v=<id>
Requirements
ffmpeg—brew install ffmpegorapt install ffmpegMISTRAL_API_KEY— console.mistral.aiANTHROPIC_API_KEY— for default backendNVIDIA_API_KEY— only for--backend nvidia
Usage
yt-instruct [OPTIONS] URL [URL...]
yt-instruct [OPTIONS] --url-file urls.txt
yt-instruct [OPTIONS] --transcript-file transcript.txt --title "Name"
yt-instruct [OPTIONS] --audio-file recording.mp3 --title "Name"
Options:
--output-dir PATH Output directory [default: .]
--keep Keep intermediate audio + transcript files
--merge Merge all videos into one document
--resume Skip already-generated outputs; reuse cached transcripts
--no-generate Stop after transcription; skip LLM generation
--content-type [tutorial|lecture|ib|auto]
Prompt style [default: auto]
--backend [anthropic|llm|nvidia]
LLM backend [default: anthropic]
--model TEXT Model name [default: claude-sonnet-4-6]
--prompt-file PATH Custom system prompt (overrides built-in)
--language LANG Output language (e.g. 'French'). Defaults to English.
--transcript-file PATH Use existing transcript; skips download and transcription
--audio-file PATH Use existing audio file; skips download, transcribes directly
--title TEXT Video title for --transcript-file or --audio-file
--draft Set draft: true in the output frontmatter [default: false]
--mistral-model TEXT [default: voxtral-mini-latest]
--audio-format [mp3|m4a] [default: mp3]
--version Show version and exit
Output Frontmatter
Every generated file includes YAML frontmatter:
---
title: "Video Title"
url: https://youtu.be/...
description: "YouTube video description"
date: 2026-04-12
draft: false
---
Use --draft to set draft: true (useful for Hugo, Jekyll, or similar static site generators).
Merged documents (--merge) do not include frontmatter.
Content Types
| Type | Use for |
|---|---|
auto |
Let the LLM detect (default) |
tutorial |
How-to / step-by-step videos |
lecture |
Tech talks, academic presentations |
ib |
IB student subject videos |
Custom Prompts
Override the built-in prompt with your own file. Template variables:
{title}, {channel}, {content_type}, {duration}
yt-instruct <url> --prompt-file my_prompt.md
Using the llm backend
pip install llm llm-anthropic
llm keys set anthropic
yt-instruct <url> --backend llm --model claude-sonnet-4-6
Using the nvidia backend
NVIDIA_API_KEY=... yt-instruct <url> --backend nvidia --model moonshotai/kimi-k2-instruct
Batch Processing
# Multiple URLs
yt-instruct url1 url2 url3 --output-dir ./docs
# Playlist (automatically expanded)
yt-instruct https://www.youtube.com/playlist?list=<id> --output-dir ./docs
# From file
cat urls.txt | yt-instruct --url-file /dev/stdin
# Merge all into one doc
yt-instruct url1 url2 --merge --output-dir ./docs
Skip Steps — Use Existing Files
--audio-file and --transcript-file resolve relative to --output-dir if the file isn't found at the given path. This lets you reference files already in the output directory without typing the full path:
# Start from an existing transcript (skips download + transcription)
yt-instruct --transcript-file transcript.txt --title "My Video" --output-dir ./docs
# File not found locally? Looked up in ./docs automatically
yt-instruct --transcript-file my_transcript.txt --output-dir ./docs
# Start from an existing audio file (skips download, still transcribes)
yt-instruct --audio-file recording.mp3 --output-dir ./docs
Resume an Interrupted Run
Use --keep to save transcripts alongside output files, then --resume to continue from where a previous run stopped:
# First run (interrupted partway through)
yt-instruct --url-file urls.txt --keep --output-dir ./docs
# Resume — skips videos with existing output; reuses cached transcripts
yt-instruct --url-file urls.txt --resume --output-dir ./docs
--resume checks at two levels per video:
- Output
.mdalready exists → skip entirely - Cached
*_transcript.txtexists (saved by--keep) → skip download and transcription, regenerate only
Changelog
See CHANGELOG.md for release history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_instruct-1.2.0.tar.gz.
File metadata
- Download URL: yt_instruct-1.2.0.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e67fb81c7da41eb1d93b0900abbc2795ac1a11902c9f54e93ac807466ab4fa16
|
|
| MD5 |
f3bdf373fbd4b9ae8ec0e21fba089b70
|
|
| BLAKE2b-256 |
9d9017529c1a29febd9f30ceac7b97c1882cb99e2d852c9155ac94c265fd8f99
|
Provenance
The following attestation bundles were made for yt_instruct-1.2.0.tar.gz:
Publisher:
publish.yml on divyavanmahajan/yt-instruct
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_instruct-1.2.0.tar.gz -
Subject digest:
e67fb81c7da41eb1d93b0900abbc2795ac1a11902c9f54e93ac807466ab4fa16 - Sigstore transparency entry: 1428900966
- Sigstore integration time:
-
Permalink:
divyavanmahajan/yt-instruct@596c556d56ef0f6e0cd47d6278fcd6086f6ef7ad -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/divyavanmahajan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@596c556d56ef0f6e0cd47d6278fcd6086f6ef7ad -
Trigger Event:
push
-
Statement type:
File details
Details for the file yt_instruct-1.2.0-py3-none-any.whl.
File metadata
- Download URL: yt_instruct-1.2.0-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f4b63918206a906511202bd306fb5974d2f6772039b8b40ce5caeef17f965b0
|
|
| MD5 |
89630a766c4a2fdc6fc9fe7fe59441af
|
|
| BLAKE2b-256 |
ad4c04691c9969e688fa4c741cd5dbce4ff8bc78752047cb102545e43259e35f
|
Provenance
The following attestation bundles were made for yt_instruct-1.2.0-py3-none-any.whl:
Publisher:
publish.yml on divyavanmahajan/yt-instruct
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_instruct-1.2.0-py3-none-any.whl -
Subject digest:
1f4b63918206a906511202bd306fb5974d2f6772039b8b40ce5caeef17f965b0 - Sigstore transparency entry: 1428900969
- Sigstore integration time:
-
Permalink:
divyavanmahajan/yt-instruct@596c556d56ef0f6e0cd47d6278fcd6086f6ef7ad -
Branch / Tag:
refs/tags/v1.2.0 - Owner: https://github.com/divyavanmahajan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@596c556d56ef0f6e0cd47d6278fcd6086f6ef7ad -
Trigger Event:
push
-
Statement type: