CLI tool for automatic YouTube video dubbing with voice cloning (Apple Silicon)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

brolnickij

These details have not been verified by PyPI

Project description

yt-dbl

Dub any YouTube video into another language — with the original speaker's voice

yt-dbl dub "https://www.youtube.com/watch?v=VIDEO_ID" -t ru

[!WARNING] Early stage — not yet stable for long videos (30+ min)

[!WARNING] Apple Silicon only (M1–M4), tested on M4 Pro (48 GB)

One command: download, transcribe, translate (Claude), clone each speaker's voice (Qwen3-TTS), mix with the original background — done. All ML inference runs locally on your Mac's GPU via MLX

Why yt-dbl

Human-quality voice cloning
Qwen3-TTS per speaker, not a generic synth. Multiple speakers are diarized and voiced separately
LLM translation
Claude handles idioms, context, and produces TTS-friendly text — not word-for-word machine translation
Background preserved
BS-RoFormer separates vocals from music/sfx. Sidechain ducking mixes them back naturally
Production audio chain
Loudnorm (-16 LUFS), de-essing, pitch-preserving speed-up, equal-power crossfade
Checkpoint & resume
Every step saves state. Interrupted? yt-dbl resume continues where it stopped
Private
Everything local except the Claude API call

Supported languages

TTS (synthesis): Russian, English, German, French, Spanish, Italian, Portuguese, Chinese, Japanese, Korean, Arabic, Hindi, Turkish, Dutch, Polish, Ukrainian

ASR (recognition): auto-detected via Unicode scripts (Latin, Cyrillic, Arabic, Devanagari, CJK, etc.)

Requirements

macOS with Apple Silicon (M1–M4) — MLX needs Metal
Python >= 3.12
FFmpeg — audio extraction, postprocessing, final assembly
yt-dlp — video download
Anthropic API key — translation via Claude

Installation

1. Install system dependencies

brew install ffmpeg yt-dlp

Optional: brew install ffmpeg-full for pitch-preserving speed-up via rubberband Without it, falls back to ffmpeg's atempo filter (works fine, just no pitch correction)

2. Install yt-dbl

# From PyPI
uv tool install --prerelease=allow yt-dbl

# Or with pipx
pipx install yt-dbl

--prerelease=allow is needed because mlx-audio depends on a pre-release transformers

If yt-dbl is not found, run uv tool update-shell && source ~/.zshrc

From source

git clone git@github.com:brolnickij/yt-dbl.git && cd yt-dbl
uv sync

Use uv run yt-dbl instead of yt-dbl when running from source

3. Set up the API key

echo 'export YT_DBL_ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.zshrc
source ~/.zshrc

Or use a .env file:

YT_DBL_ANTHROPIC_API_KEY=sk-ant-...

4. Pre-download models (optional)

Models (~8.2 GB) download automatically on first run, or fetch them ahead of time:

yt-dbl models download

Configuration

Priority: CLI args > env vars (YT_DBL_ prefix) > .env file > defaults

cp .env.example .env

Env variable	Default	Description
`YT_DBL_ANTHROPIC_API_KEY`	—	Required — Anthropic API key
`YT_DBL_TARGET_LANGUAGE`	`ru`	Target language (ISO 639-1)
`YT_DBL_OUTPUT_FORMAT`	`mp4`	`mp4` / `mkv`
`YT_DBL_SUBTITLE_MODE`	`softsub`	`softsub` / `hardsub` / `none`
`YT_DBL_BACKGROUND_VOLUME`	`0.15`	Background volume during speech (0.0–1.0)
`YT_DBL_MAX_SPEED_FACTOR`	`1.4`	Max TTS speed-up to fit timing (1.0–2.0)
`YT_DBL_MAX_LOADED_MODELS`	`0` (auto)	Max models in memory (0 = auto by RAM)
`YT_DBL_WORK_DIR`	`dubbed`	Output directory

See .env.example for all 33 parameters

Quick start

yt-dbl dub "https://www.youtube.com/watch?v=VIDEO_ID"           # dub to Russian (default)
yt-dbl dub "https://youtu.be/VIDEO_ID" -t es                    # dub to Spanish
yt-dbl dub "https://youtu.be/VIDEO_ID" -o ./out                 # custom output dir
yt-dbl dub "https://youtu.be/VIDEO_ID" --from-step translate    # re-run from a specific step
yt-dbl resume VIDEO_ID                                          # resume after interrupt
yt-dbl status VIDEO_ID                                          # check job progress

Commands

`dub` — dub a video

yt-dbl dub <URL> [options]

Option	Description	Default
`-t`, `--target-language`	Target language	`ru`
`-o`, `--output-dir`	Output directory	`./dubbed`
`--bg-volume`	Background volume (0.0–1.0)	`0.15`
`--max-speed`	Max TTS speed-up (1.0–2.0)	`1.4`
`--max-models`	Max models in memory	auto
`--from-step`	Start from: `download` / `separate` / `transcribe` / `translate` / `synthesize` / `assemble`	—
`--no-subs`	Disable subtitles	`false`
`--sub-mode`	`softsub` / `hardsub` / `none`	`softsub`
`--format`	`mp4` / `mkv`	`mp4`

`resume` — pick up where it stopped

yt-dbl resume <video_id> [--max-models N] [-o DIR]

`status` — check job progress

yt-dbl status <video_id>

`models list` / `models download`

yt-dbl models list        # show models, download status, size
yt-dbl models download    # pre-download all models

How it works

┌─────────────────────────────────────────────────────────────────────────────────┐
│                                YouTube URL                                      │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│  1. DOWNLOAD                                                                    │
│                                                                                 │
│  yt-dlp downloads the video, ffmpeg extracts the audio track                    │
│  Output: video.mp4, audio.wav (48 kHz, mono)                                    │
└─────────────────────────────────────┬───────────────────────────────────────────┘
                                      │
                                      ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│  2. SEPARATE                                                                    │
│                                                                                 │
│  BS-RoFormer splits audio into vocals and background (ONNX + CoreML)            │
│  Output: vocals.wav, background.wav                                             │
└───────────────────────────┬────────────────────────────────────────────┬────────┘
                            │                                            │
                       vocals.wav                                  background.wav
                            │                                            │
                            ▼                                            │
┌──────────────────────────────────────────────────────┐                 │
│  3. TRANSCRIBE                                       │                 │
│                                                      │                 │
│  VibeVoice-ASR (MLX, ~5.7 GB)                        │                 │
│    → speech segments + speaker diarization           │                 │
│  Qwen3-ForcedAligner (MLX, ~600 MB)                  │                 │
│    → word-level timestamps                           │                 │
│  + language auto-detection via Unicode scripts       │                 │
│                                                      │                 │
│  Output: segments.json                               │                 │
└──────────────────────────┬───────────────────────────┘                 │
                           │                                             │
                           ▼                                             │
┌──────────────────────────────────────────────────────┐                 │
│  4. TRANSLATE                                        │                 │
│                                                      │                 │
│  Claude API (single-pass, all segments at once)      │                 │
│  TTS-friendly output: short phrases, spelled-out     │                 │
│  numbers, no special characters                      │                 │
│                                                      │                 │
│  Output: translations.json, subtitles.srt            │                 │
└──────────────────────────┬───────────────────────────┘                 │
                           │                                             │
                           ▼                                             │
┌──────────────────────────────────────────────────────┐                 │
│  5. SYNTHESIZE                                       │                 │
│                                                      │                 │
│  Qwen3-TTS (MLX, ~1.7 GB) — voice cloning            │                 │
│  using a voice reference for each speaker            │                 │
│  Postprocessing (parallel, ThreadPool):              │                 │
│    • speed-up (rubberband or atempo)                 │                 │
│    • loudnorm (-16 LUFS, 2-pass)                     │                 │
│    • de-essing                                       │                 │
│                                                      │                 │
│  Output: segment_0000.wav, segment_0001.wav ...      │                 │
└──────────────────────────┬───────────────────────────┘                 │
                           │                                             │
                           ▼                                             ▼
┌─────────────────────────────────────────────────────────────────────────────────┐
│  6. ASSEMBLE                                                                    │
│                                                                                 │
│  Speech track (crossfade 50 ms, equal-power) + background (sidechain ducking)   │
│  + video (copy) + subtitles (softsub / hardsub / none)                          │
│  All in a single ffmpeg call                                                    │
│                                                                                 │
│  Output: result.mp4                                                             │
└──────────────────────────────────────────┬──────────────────────────────────────┘
                                           │
                                           ▼
                                 ┌───────────────────┐
                                 │    result.mp4     │
                                 └───────────────────┘

Memory management

LRU model manager — auto-selects how many models to keep loaded based on RAM:

RAM              Models     Batch (separation)
─────────────    ───────    ──────────────────
<= 16 GB         1          1
17–31 GB         2          2
32–47 GB         3          4
48+ GB           3          8

ASR (~5.7 GB) is unloaded before loading the Aligner to avoid holding both in memory

Output directory structure

dubbed/
└── <video_id>/
    ├── state.json                  ← pipeline checkpoint (JSON)
    ├── 01_download/
    │   ├── video.mp4               ← original video
    │   └── audio.wav               ← extracted audio track (48 kHz, mono)
    ├── 02_separate/
    │   ├── vocals.wav              ← isolated vocals
    │   └── background.wav          ← background music/noise
    ├── 03_transcribe/
    │   └── segments.json           ← segments, speakers, words with timestamps
    ├── 04_translate/
    │   ├── translations.json       ← translated texts
    │   └── subtitles.srt           ← subtitles (SRT)
    ├── 05_synthesize/
    │   ├── ref_SPEAKER_00.wav      ← speaker voice reference
    │   ├── segment_0000.wav        ← final segments (after postprocessing)
    │   ├── segment_0001.wav
    │   └── synth_meta.json         ← synthesis metadata
    ├── 06_assemble/
    │   └── speech.wav              ← assembled speech track
    └── result.mp4                  ← final output (in job dir root)

Models

Model	Size	Task
VibeVoice-ASR	~5.7 GB	ASR + speaker diarization
Qwen3-ForcedAligner	~600 MB	Word-level alignment
Qwen3-TTS	~1.7 GB	TTS + voice cloning
MelBand-RoFormer (BS-RoFormer)	~200 MB	Vocal/background separation
Claude Sonnet 4.5	—	Translation (API)

All local models run on MLX (Metal GPU), total ~8.2 GB

Development

just check    # lint + format + typecheck + tests
just test     # fast tests (parallel, coverage)
just test-e2e # E2E (needs ffmpeg + network)
just fix      # auto-fix lint
just format   # auto-format

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

brolnickij

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.8.0

Feb 8, 2026

1.7.1

Feb 8, 2026

1.7.0

Feb 7, 2026

1.6.4

Feb 7, 2026

1.6.3

Feb 7, 2026

1.6.2

Feb 7, 2026

This version

1.6.1

Feb 7, 2026

1.6.0

Feb 7, 2026

1.5.0

Feb 6, 2026

1.4.1

Feb 6, 2026

1.4.0

Feb 6, 2026

1.3.0

Feb 6, 2026

1.2.0

Feb 6, 2026

1.1.0

Feb 6, 2026

1.0.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yt_dbl-1.6.1.tar.gz (237.5 kB view details)

Uploaded Feb 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yt_dbl-1.6.1-py3-none-any.whl (60.0 kB view details)

Uploaded Feb 7, 2026 Python 3

File details

Details for the file yt_dbl-1.6.1.tar.gz.

File metadata

Download URL: yt_dbl-1.6.1.tar.gz
Upload date: Feb 7, 2026
Size: 237.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_dbl-1.6.1.tar.gz
Algorithm	Hash digest
SHA256	`cedc0b4b5b36c3f87c6af2f0e41b99d9d7b713b6c02b974582980901caeb7464`
MD5	`ced78aff52c8d4535a08ce692f38b5b6`
BLAKE2b-256	`6c3bec062328ed284b68a91b25ffe9945ae51b54008676922bfa794701e5be55`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_dbl-1.6.1.tar.gz:

Publisher: release.yml on brolnickij/yt-dbl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yt_dbl-1.6.1.tar.gz
- Subject digest: cedc0b4b5b36c3f87c6af2f0e41b99d9d7b713b6c02b974582980901caeb7464
- Sigstore transparency entry: 926941267
- Sigstore integration time: Feb 7, 2026
Source repository:
- Permalink: brolnickij/yt-dbl@519d3f2edda459973146154a8fdbc0edad919c2a
- Branch / Tag: refs/heads/master
- Owner: https://github.com/brolnickij
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@519d3f2edda459973146154a8fdbc0edad919c2a
- Trigger Event: push

File details

Details for the file yt_dbl-1.6.1-py3-none-any.whl.

File metadata

Download URL: yt_dbl-1.6.1-py3-none-any.whl
Upload date: Feb 7, 2026
Size: 60.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for yt_dbl-1.6.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`439cfc391c3777095c77d6d41f4405d830ee9d64db526564e2fa66b6262b9434`
MD5	`52a66f16093c0056d201306120925927`
BLAKE2b-256	`43b4b573c9d411d1d5e235353c5dc2138e8913673fb1e1788b49be4f49f722c4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for yt_dbl-1.6.1-py3-none-any.whl:

Publisher: release.yml on brolnickij/yt-dbl

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: yt_dbl-1.6.1-py3-none-any.whl
- Subject digest: 439cfc391c3777095c77d6d41f4405d830ee9d64db526564e2fa66b6262b9434
- Sigstore transparency entry: 926941270
- Sigstore integration time: Feb 7, 2026
Source repository:
- Permalink: brolnickij/yt-dbl@519d3f2edda459973146154a8fdbc0edad919c2a
- Branch / Tag: refs/heads/master
- Owner: https://github.com/brolnickij
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@519d3f2edda459973146154a8fdbc0edad919c2a
- Trigger Event: push

yt-dbl 1.6.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

yt-dbl

Why yt-dbl

Supported languages

Requirements

Installation

1. Install system dependencies

2. Install yt-dbl

3. Set up the API key

4. Pre-download models (optional)

Configuration

Quick start

Commands

dub — dub a video

resume — pick up where it stopped

status — check job progress

models list / models download

How it works

Memory management

Output directory structure

Models

Development

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`dub` — dub a video

`resume` — pick up where it stopped

`status` — check job progress

`models list` / `models download`