Synthesize a timestamp-synced speech track from a subtitle file and mux it into video

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

srt2speech

Turn a subtitle file into a timestamp-synced speech track and mux it into a video.

Give it a video + an .srt (or .vtt/.ass); it synthesizes audio where each subtitle is spoken at its timestamp, then optionally muxes the track back in with ffmpeg. Useful for restoring lost audio, rough translation dubs, narrating silent videos, or adding audio description by reading only the descriptive/SDH cues.

It does the SRT→audio part well and nothing else: no translation, no transcription — bring an already-final subtitle file.

Requirements

Python ≥ 3.11, uv
ffmpeg / ffprobe on PATH
A TTS backend:
- piper — a local gopipertts server (free, default; set SRT2SPEECH_PIPER_URL if not on http://localhost:8080)
- openai — gpt-4o-mini-tts (set OPENAI_API_KEY)
- elevenlabs — eleven_multilingual_v2 (set ELEVENLABS_API_KEY)

Install

Run it straight from PyPI with no install — uvx fetches it on first use:

uvx srt2speech --help

Or install it as a persistent tool (then just call srt2speech):

uv tool install srt2speech

Usage

# generate a synced track with the local piper backend, sized to the video
uvx srt2speech generate subs.srt --video clip.mp4 -o track.wav

# generate + mux into the video in one step
uvx srt2speech run clip.mp4 subs.srt -o dubbed.mp4

# emit one audio file per segment + a manifest, instead of a merged track
uvx srt2speech generate subs.srt --chunks ./chunks

# raw, per-cue synthesis (no time-fitting)
uvx srt2speech generate subs.srt --chunks ./chunks --chunk-by cue --chunk-audio raw

# chunks with ending verification: transcribe each chunk, re-synthesize dropped endings
OPENAI_API_KEY=... uvx srt2speech generate subs.srt --backend openai \
    --chunks ./chunks --verify-endings

# surgically re-synthesize chunks 3 and 17 into an existing chunks dir
uvx srt2speech generate subs.srt --chunks ./chunks --only 3,17

# paid backend with delivery guidance
OPENAI_API_KEY=... uvx srt2speech generate subs.srt \
    --backend openai --voice coral --instructions "calm documentary narration" -o track.wav

# audio description: only descriptive/SDH cues, mixed over the existing audio
uvx srt2speech run movie.mkv subs.srt --mode descriptive --mux-mode mix -o described.mkv

# mux an existing track yourself
uvx srt2speech mux clip.mp4 track.wav -o dubbed.mp4

# list a backend's voices
uvx srt2speech voices --backend openai

Docker Compose

Runs a local piper server plus an on-demand CLI; no host Python or ffmpeg needed. Put your video and subtitles in ./data (mounted at /data); pulled voices are cached in ./voices.

# 1. start the piper TTS server (preloads the default voice)
docker compose up -d gopipertts

# 2. run the CLI against files in ./data
docker compose run --rm srt2speech run /data/clip.mp4 /data/subs.srt -o /data/dubbed.mp4

# 3. tear down when done
docker compose down

For the OpenAI backend, put OPENAI_API_KEY=sk-... in a .env file (gitignored) — Compose loads it automatically and passes it through to the CLI container.

Sync strategies (`--strategy`)

Speech rarely fits a cue's window exactly. The fit engine offers:

hybrid (default) — fit into the cue window plus the silent gap before the next cue; only then speed up, capped by --max-speedup (default 1.15).
overflow — never speed up; let speech run into following silence (best quality, can drift).
precise — fit the exact cue window, speeding up to the cap.

Modes (`--mode`)

all (default) · descriptive (SDH/audio-description only) · dialogue (drop sound cues).

Chunked output (`--chunks`)

Instead of one merged track, generate --chunks DIR writes each piece of speech to its own .wav and a manifest.json mapping every file back to its timing and text — handy for re-importing into a video editor. In this mode -o/--video are ignored.

--chunk-by — segment (default): merged sentence-sized units (better prosody); cue: one file per raw subtitle entry.
--chunk-audio — fitted (default): time-shaped to its window per --strategy; raw: the natural synthesis with no time-stretching.

Files are named <index>_<start_ms>ms.wav (e.g. 0003_0012400ms.wav). The manifest records start_ms/end_ms (the cue/segment window), audio_ms (rendered length), and text per chunk.

--only 3,17 — re-synthesize just those chunk indexes into an existing chunks dir, overwriting their audio and refreshing their manifest entries. For surgical fixes after editing a cue's text, without paying to re-render everything else.

Ending verification (`--verify-endings`)

Some TTS models — observed extensively on gpt-4o-mini-tts — occasionally drop a short trailing sentence during synthesis: the audio just ends early, so duration checks pass and the loss is silent. --verify-endings (chunks mode only) closes that hole: after synthesis it transcribes each chunk (OpenAI whisper-1, so OPENAI_API_KEY is required regardless of TTS backend) and checks the chunk's last sentence was actually spoken, comparing content words so hyphenation, numerals, and currency wording don't false-positive. Chunks that lost their ending are re-synthesized and re-checked, up to --verify-rounds (default 3) rounds.

Each pass writes a verify.json verdict ({ok, checked, failed[]}) next to the manifest. If a chunk still fails after all rounds the command exits non-zero and names the cues — the drop is stochastic but text-dependent, so rewrite the cue (fold the short trailing sentence into the prior one) and re-run with --only <index> --verify-endings. --verify-thresh (default 0.5) sets the fraction of last-sentence content words that must be heard.

Development

From a clone of the repo:

uv sync
uv run srt2speech --help
uv run pytest
uv run ruff check

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

nbr23

Release history Release notifications | RSS feed

This version

1.6.0

Jul 2, 2026

1.5.0

Jun 28, 2026

1.3.0

Jun 27, 2026

1.2.0

Jun 26, 2026

1.1.0

Jun 26, 2026

1.0.0

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

srt2speech-1.6.0.tar.gz (21.4 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

srt2speech-1.6.0-py3-none-any.whl (29.9 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file srt2speech-1.6.0.tar.gz.

File metadata

Download URL: srt2speech-1.6.0.tar.gz
Upload date: Jul 2, 2026
Size: 21.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for srt2speech-1.6.0.tar.gz
Algorithm	Hash digest
SHA256	`835a7473d3717f9cfbd5a579187dcfc79c53962a628473b0fdb42664190d6b89`
MD5	`7453786c169c411c3ff012414ba26878`
BLAKE2b-256	`5e679ae89de8b1c5e12892fff15036af3a057213a4b3b157de688b5a1a61fa98`

See more details on using hashes here.

Provenance

The following attestation bundles were made for srt2speech-1.6.0.tar.gz:

Publisher: publish.yml on nbr23/srt2speech

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: srt2speech-1.6.0.tar.gz
- Subject digest: 835a7473d3717f9cfbd5a579187dcfc79c53962a628473b0fdb42664190d6b89
- Sigstore transparency entry: 2043678817
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: nbr23/srt2speech@ad2ca4747fd3c262642e2c6a2125bcd7aaf100ea
- Branch / Tag: refs/heads/master
- Owner: https://github.com/nbr23
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ad2ca4747fd3c262642e2c6a2125bcd7aaf100ea
- Trigger Event: push

File details

Details for the file srt2speech-1.6.0-py3-none-any.whl.

File metadata

Download URL: srt2speech-1.6.0-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 29.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for srt2speech-1.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b0c032cd332dc3b655a852749989ffbaa946f05691f62947bbb1417aaa3ad54d`
MD5	`2e628d7dc8b172f413eb8064d2cd5633`
BLAKE2b-256	`7532a580149bc1b6627ced935a212eeb442b6417b11d381e931036aef1a676f8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for srt2speech-1.6.0-py3-none-any.whl:

Publisher: publish.yml on nbr23/srt2speech

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: srt2speech-1.6.0-py3-none-any.whl
- Subject digest: b0c032cd332dc3b655a852749989ffbaa946f05691f62947bbb1417aaa3ad54d
- Sigstore transparency entry: 2043678825
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: nbr23/srt2speech@ad2ca4747fd3c262642e2c6a2125bcd7aaf100ea
- Branch / Tag: refs/heads/master
- Owner: https://github.com/nbr23
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@ad2ca4747fd3c262642e2c6a2125bcd7aaf100ea
- Trigger Event: push

srt2speech 1.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

srt2speech

Requirements

Install

Usage

Docker Compose

Sync strategies (`--strategy`)

Modes (`--mode`)

Chunked output (`--chunks`)

Ending verification (`--verify-endings`)

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

srt2speech 1.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

srt2speech

Requirements

Install

Usage

Docker Compose

Sync strategies (--strategy)

Modes (--mode)

Chunked output (--chunks)

Ending verification (--verify-endings)

Development

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Sync strategies (`--strategy`)

Modes (`--mode`)

Chunked output (`--chunks`)

Ending verification (`--verify-endings`)