Convert audio files to subtitles (VTT, SRT) using Faster-Whisper

These details have not been verified by PyPI

Project links

Project description

Audio Subtitler

Convert audio files to subtitles (VTT, SRT) using Faster-Whisper.

Features

🚀 Full Faster-Whisper support - All features and parameters from faster-whisper
📝 Multiple formats - VTT (WebVTT) and SRT subtitle output
🎯 Smart auto-detection - Automatically detects format from file extension
🌍 Multi-language - Supports 100+ languages with auto-detection
⚡ GPU acceleration - CUDA support for faster transcription
🎙️ Voice Activity Detection - Automatically removes silence
💻 Simple APIs - Easy-to-use CLI and Python API
🐳 Docker GPU support - Ready for serverless deployment

Installation

pip install audio-subtitler

Optional dependencies:

pip install audio-subtitler[runpod]  # For RunPod serverless
pip install audio-subtitler[dev]     # For development

Quick Start

CLI

# Auto-detect format from file extension (recommended)
audiosubtitler input.mp3 -o output.vtt
audiosubtitler input.mp3 -o output.srt

# Specify options
audiosubtitler input.mp3 -o output.vtt --model large-v3 --language en --device cuda

# Output to stdout
audiosubtitler input.mp3 --format srt > output.srt

# Use shorter command
audiosub input.mp3 -o output.vtt

# Hint punctuation style (helps large-v3 and others output more periods, commas, etc.)
audiosubtitler input.mp3 -o output.vtt --initial-prompt "Hello. How are you? Thanks."

Python API

from src import AudioSubtitler

# Initialize
converter = AudioSubtitler(
    model_size_or_path="base",
    device="cpu",
    compute_type="int8"
)

# Transcribe (returns subtitle string directly)
vtt = converter.transcribe("audio.mp3", format="vtt", language="en")
print(vtt)  # "WEBVTT\n\n00:00:00.000 --> ..."

srt = converter.transcribe("audio.mp3", format="srt")

# Better punctuation: pass a short punctuated phrase as a hint
vtt = converter.transcribe("audio.mp3", format="vtt", initial_prompt="Hello. How are you? Thanks.")

Getting better punctuation (for subtitles)

Larger Whisper models (e.g. large-v3) are more accurate on words but often output longer segments with less punctuation. Parameters that affect punctuation and segment boundaries:

Parameter	Default	Effect
`--initial-prompt`	`"Hello. How are you? Thanks, bye."`	Hints punctuation style; use `''` to disable.
`--vad-silence-duration-ms`	`500`	Lower = more segment breaks (e.g. `300` or `400`) = more punctuation.
`--patience`	`1.0`	Lower = more segment boundaries (e.g. `0` or `0.5`) = more punctuation.

Example for more punctuation:

audiosubtitler input.mp3 -o output.vtt --vad-silence-duration-ms 400 --patience 0.5

Post-process with a punctuation restoration model
For English, you can run the transcript through a dedicated punctuation model (e.g. rpunct or speechbox) and then regenerate VTT/SRT from the punctuated text if your tool supports it.

API Reference

AudioSubtitler

Constructor: AudioSubtitler(**kwargs)

Accepts all faster-whisper WhisperModel parameters:

model_size_or_path: Model name (tiny, base, small, medium, large, large-v3) or path
device: "cpu", "cuda", or "auto"
compute_type: "int8", "int8_float16", "int16", "float16", "float32"
cpu_threads, num_workers, download_root, local_files_only, etc.

Method: transcribe(audio, format="vtt", **kwargs)

Parameters:

audio: File path (str), file object (BinaryIO), or numpy array
format: "vtt" or "srt" (default: "vtt")
**kwargs: All faster-whisper transcribe parameters
- language, beam_size, vad_parameters, word_timestamps, etc.

Returns: str — The subtitle content (VTT, SRT, or JSON string depending on format).

Docker (GPU only)

docker-compose -f docker-compose-gpu.yml up

RunPod serverless

Input (in the job input):

{
  "audio": "<base64_encoded_audio>",
  "format": "vtt"
}

audio: required, base64-encoded audio bytes
format: optional, "vtt" (default), "srt", or "json"

Output: the handler returns the subtitle string directly (no wrapper). RunPod puts it in the job result’s output field, so the response body looks like:

{
  "delayTime": 1119,
  "executionTime": 499,
  "id": "...",
  "output": "WEBVTT\n\n00:00:00.000 --> 00:00:00.280\nHello\n\n...",
  "status": "COMPLETED",
  "workerId": "..."
}

Use response["output"] to get the VTT/SRT/JSON string.

Errors: the handler raises exceptions (e.g. no audio, invalid base64, transcription failure). RunPod surfaces these in its error response.

Output Examples

VTT:

WEBVTT

00:00:00.000 --> 00:00:03.500
Hello, this is a test transcription.

00:00:03.500 --> 00:00:07.200
The audio is converted to text with timestamps.

SRT:

1
00:00:00,000 --> 00:00:03,500
Hello, this is a test transcription.

2
00:00:03,500 --> 00:00:07,200
The audio is converted to text with timestamps.

Environment Variables

Variable	Default	Description
`WHISPER_MODEL`	`base`	Model size
`WHISPER_DEVICE`	`cpu`	cpu, cuda, auto
`WHISPER_COMPUTE_TYPE`	`int8`	Compute type
`WHISPER_BEAM_SIZE`	`5`	Beam size
`WHISPER_VAD_SILENCE_MS`	`500`	Min silence (ms) for segment boundaries (RunPod). Lower = more punctuation.
`WHISPER_PATIENCE`	`1.0`	Beam search patience (RunPod). Lower = more segment boundaries.

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.16

Mar 8, 2026

0.1.15

Mar 8, 2026

0.1.14

Mar 7, 2026

0.1.13

Mar 4, 2026

0.1.12

Jan 11, 2026

0.1.11

Jan 10, 2026

0.1.10

Jan 9, 2026

0.1.9

Jan 9, 2026

0.1.8

Jan 9, 2026

0.1.7

Jan 9, 2026

0.1.6

Jan 9, 2026

0.1.5

Jan 9, 2026

0.1.4

Jan 9, 2026

0.1.3

Jan 9, 2026

0.1.2

Nov 1, 2025

0.1.1

Nov 1, 2025

0.1.0

Nov 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_subtitler-0.1.16.tar.gz (291.5 kB view details)

Uploaded Mar 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audio_subtitler-0.1.16-py3-none-any.whl (10.6 kB view details)

Uploaded Mar 8, 2026 Python 3

File details

Details for the file audio_subtitler-0.1.16.tar.gz.

File metadata

Download URL: audio_subtitler-0.1.16.tar.gz
Upload date: Mar 8, 2026
Size: 291.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for audio_subtitler-0.1.16.tar.gz
Algorithm	Hash digest
SHA256	`ce4e86d131ada14f93bf7c8fd5d61ba220e38d8e2b36ec81ea3dcfa9ed277dd7`
MD5	`426a3eb1094a997af35fce2c635e26fc`
BLAKE2b-256	`b30449ab6b1cbfef5fdb2ce4e8e275dbb34a5a950649ec46d13a1aad8b0979bd`

See more details on using hashes here.

File details

Details for the file audio_subtitler-0.1.16-py3-none-any.whl.

File metadata

Download URL: audio_subtitler-0.1.16-py3-none-any.whl
Upload date: Mar 8, 2026
Size: 10.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for audio_subtitler-0.1.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f3eebf8b9a9b043ff4133bc44d0f558e46fe0e3f9279463f826bfbad5966bb34`
MD5	`b1a0f7e2813617ddbfa8076eee4eccb1`
BLAKE2b-256	`33fb502313ff347f18a3c646d02af098f4db30ea1aabd81b6a58cd68ff4d45f5`

See more details on using hashes here.

audio-subtitler 0.1.16

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Audio Subtitler

Features

Installation

Quick Start

CLI

Python API

Getting better punctuation (for subtitles)

API Reference

AudioSubtitler

Docker (GPU only)

Output Examples

Environment Variables

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes