Convert audio files to subtitles (VTT, SRT) using Faster-Whisper
Project description
Audio Subtitler
Convert audio files to subtitles (VTT, SRT) using Faster-Whisper.
Features
- 🚀 Full Faster-Whisper support - All features and parameters from faster-whisper
- 📝 Multiple formats - VTT (WebVTT) and SRT subtitle output
- 🎯 Smart auto-detection - Automatically detects format from file extension
- 🌍 Multi-language - Supports 100+ languages with auto-detection
- ⚡ GPU acceleration - CUDA support for faster transcription
- 🎙️ Voice Activity Detection - Automatically removes silence
- 💻 Simple APIs - Easy-to-use CLI and Python API
- 🐳 Docker GPU support - Ready for serverless deployment
Installation
pip install audio-subtitler
Optional dependencies:
pip install audio-subtitler[runpod] # For RunPod serverless
pip install audio-subtitler[dev] # For development
Quick Start
CLI
# Auto-detect format from file extension (recommended)
audiosubtitler input.mp3 -o output.vtt
audiosubtitler input.mp3 -o output.srt
# Specify options
audiosubtitler input.mp3 -o output.vtt --model large-v3 --language en --device cuda
# Output to stdout
audiosubtitler input.mp3 --format srt > output.srt
# Use shorter command
audiosub input.mp3 -o output.vtt
# Hint punctuation style (helps large-v3 and others output more periods, commas, etc.)
audiosubtitler input.mp3 -o output.vtt --initial-prompt "Hello. How are you? Thanks."
Python API
from src import AudioSubtitler
# Initialize
converter = AudioSubtitler(
model_size_or_path="base",
device="cpu",
compute_type="int8"
)
# Transcribe (returns subtitle string directly)
vtt = converter.transcribe("audio.mp3", format="vtt", language="en")
print(vtt) # "WEBVTT\n\n00:00:00.000 --> ..."
srt = converter.transcribe("audio.mp3", format="srt")
# Better punctuation: pass a short punctuated phrase as a hint
vtt = converter.transcribe("audio.mp3", format="vtt", initial_prompt="Hello. How are you? Thanks.")
Getting better punctuation (for subtitles)
Larger Whisper models (e.g. large-v3) are more accurate on words but often output longer segments with less punctuation. Parameters that affect punctuation and segment boundaries:
| Parameter | Default | Effect |
|---|---|---|
--initial-prompt |
"Hello. How are you? Thanks, bye." |
Hints punctuation style; use '' to disable. |
--vad-silence-duration-ms |
500 |
Lower = more segment breaks (e.g. 300 or 400) = more punctuation. |
--patience |
1.0 |
Lower = more segment boundaries (e.g. 0 or 0.5) = more punctuation. |
Example for more punctuation:
audiosubtitler input.mp3 -o output.vtt --vad-silence-duration-ms 400 --patience 0.5
- Post-process with a punctuation restoration model
For English, you can run the transcript through a dedicated punctuation model (e.g. rpunct or speechbox) and then regenerate VTT/SRT from the punctuated text if your tool supports it.
API Reference
AudioSubtitler
Constructor: AudioSubtitler(**kwargs)
Accepts all faster-whisper WhisperModel parameters:
model_size_or_path: Model name (tiny, base, small, medium, large, large-v3) or pathdevice: "cpu", "cuda", or "auto"compute_type: "int8", "int8_float16", "int16", "float16", "float32"cpu_threads,num_workers,download_root,local_files_only, etc.
Method: transcribe(audio, format="vtt", **kwargs)
Parameters:
audio: File path (str), file object (BinaryIO), or numpy arrayformat: "vtt" or "srt" (default: "vtt")**kwargs: All faster-whisper transcribe parameterslanguage,beam_size,vad_parameters,word_timestamps, etc.
Returns: str — The subtitle content (VTT, SRT, or JSON string depending on format).
Docker (GPU only)
docker-compose -f docker-compose-gpu.yml up
RunPod serverless
Input (in the job input):
{
"audio": "<base64_encoded_audio>",
"format": "vtt"
}
audio: required, base64-encoded audio bytesformat: optional,"vtt"(default),"srt", or"json"
Output: the handler returns the subtitle string directly (no wrapper). RunPod puts it in the job result’s output field, so the response body looks like:
{
"delayTime": 1119,
"executionTime": 499,
"id": "...",
"output": "WEBVTT\n\n00:00:00.000 --> 00:00:00.280\nHello\n\n...",
"status": "COMPLETED",
"workerId": "..."
}
Use response["output"] to get the VTT/SRT/JSON string.
Errors: the handler raises exceptions (e.g. no audio, invalid base64, transcription failure). RunPod surfaces these in its error response.
Output Examples
VTT:
WEBVTT
00:00:00.000 --> 00:00:03.500
Hello, this is a test transcription.
00:00:03.500 --> 00:00:07.200
The audio is converted to text with timestamps.
SRT:
1
00:00:00,000 --> 00:00:03,500
Hello, this is a test transcription.
2
00:00:03,500 --> 00:00:07,200
The audio is converted to text with timestamps.
Environment Variables
| Variable | Default | Description |
|---|---|---|
WHISPER_MODEL |
base |
Model size |
WHISPER_DEVICE |
cpu |
cpu, cuda, auto |
WHISPER_COMPUTE_TYPE |
int8 |
Compute type |
WHISPER_BEAM_SIZE |
5 |
Beam size |
WHISPER_VAD_SILENCE_MS |
500 |
Min silence (ms) for segment boundaries (RunPod). Lower = more punctuation. |
WHISPER_PATIENCE |
1.0 |
Beam search patience (RunPod). Lower = more segment boundaries. |
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audio_subtitler-0.1.16.tar.gz.
File metadata
- Download URL: audio_subtitler-0.1.16.tar.gz
- Upload date:
- Size: 291.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce4e86d131ada14f93bf7c8fd5d61ba220e38d8e2b36ec81ea3dcfa9ed277dd7
|
|
| MD5 |
426a3eb1094a997af35fce2c635e26fc
|
|
| BLAKE2b-256 |
b30449ab6b1cbfef5fdb2ce4e8e275dbb34a5a950649ec46d13a1aad8b0979bd
|
File details
Details for the file audio_subtitler-0.1.16-py3-none-any.whl.
File metadata
- Download URL: audio_subtitler-0.1.16-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3eebf8b9a9b043ff4133bc44d0f558e46fe0e3f9279463f826bfbad5966bb34
|
|
| MD5 |
b1a0f7e2813617ddbfa8076eee4eccb1
|
|
| BLAKE2b-256 |
33fb502313ff347f18a3c646d02af098f4db30ea1aabd81b6a58cd68ff4d45f5
|