Skip to main content

Convert audio files to subtitles (VTT, SRT) using Faster-Whisper

Project description

Audio Subtitler

Convert audio files to subtitles (VTT, SRT) using Faster-Whisper.

PyPI Python Versions Downloads Monthly Downloads Run on RunPod

Features

  • 🚀 Full Faster-Whisper support - All features and parameters from faster-whisper
  • 📝 Multiple formats - VTT (WebVTT) and SRT subtitle output
  • 🎯 Smart auto-detection - Automatically detects format from file extension
  • 🌍 Multi-language - Supports 100+ languages with auto-detection
  • GPU acceleration - CUDA support for faster transcription
  • 🎙️ Voice Activity Detection - Automatically removes silence
  • 💻 Simple APIs - Easy-to-use CLI and Python API
  • 🐳 Docker GPU support - Ready for serverless deployment

Installation

pip install audio-subtitler

Optional dependencies:

pip install audio-subtitler[runpod]  # For RunPod serverless
pip install audio-subtitler[dev]     # For development

Quick Start

CLI

# Auto-detect format from file extension (recommended)
audiosubtitler input.mp3 -o output.vtt
audiosubtitler input.mp3 -o output.srt

# Specify options
audiosubtitler input.mp3 -o output.vtt --model large-v3 --language en --device cuda

# Output to stdout
audiosubtitler input.mp3 --format srt > output.srt

# Use shorter command
audiosub input.mp3 -o output.vtt

# Hint punctuation style (helps large-v3 and others output more periods, commas, etc.)
audiosubtitler input.mp3 -o output.vtt --initial-prompt "Hello. How are you? Thanks."

Python API

from src import AudioSubtitler

# Initialize
converter = AudioSubtitler(
    model_size_or_path="base",
    device="cpu",
    compute_type="int8"
)

# Transcribe (returns subtitle string directly)
vtt = converter.transcribe("audio.mp3", format="vtt", language="en")
print(vtt)  # "WEBVTT\n\n00:00:00.000 --> ..."

srt = converter.transcribe("audio.mp3", format="srt")

# Better punctuation: pass a short punctuated phrase as a hint
vtt = converter.transcribe("audio.mp3", format="vtt", initial_prompt="Hello. How are you? Thanks.")

Getting better punctuation (for subtitles)

Larger Whisper models (e.g. large-v3) are more accurate on words but often output longer segments with less punctuation. Parameters that affect punctuation and segment boundaries:

Parameter Default Effect
--initial-prompt "Hello. How are you? Thanks, bye." Hints punctuation style; use '' to disable.
--vad-silence-duration-ms 500 Lower = more segment breaks (e.g. 300 or 400) = more punctuation.
--patience 1.0 Lower = more segment boundaries (e.g. 0 or 0.5) = more punctuation.

Example for more punctuation:

audiosubtitler input.mp3 -o output.vtt --vad-silence-duration-ms 400 --patience 0.5
  1. Post-process with a punctuation restoration model
    For English, you can run the transcript through a dedicated punctuation model (e.g. rpunct or speechbox) and then regenerate VTT/SRT from the punctuated text if your tool supports it.

API Reference

AudioSubtitler

Constructor: AudioSubtitler(**kwargs)

Accepts all faster-whisper WhisperModel parameters:

  • model_size_or_path: Model name (tiny, base, small, medium, large, large-v3) or path
  • device: "cpu", "cuda", or "auto"
  • compute_type: "int8", "int8_float16", "int16", "float16", "float32"
  • cpu_threads, num_workers, download_root, local_files_only, etc.

Method: transcribe(audio, format="vtt", **kwargs)

Parameters:

  • audio: File path (str), file object (BinaryIO), or numpy array
  • format: "vtt" or "srt" (default: "vtt")
  • **kwargs: All faster-whisper transcribe parameters
    • language, beam_size, vad_parameters, word_timestamps, etc.

Returns: str — The subtitle content (VTT, SRT, or JSON string depending on format).

Docker (GPU only)

docker-compose -f docker-compose-gpu.yml up

RunPod serverless

Input (in the job input):

{
  "audio": "<base64_encoded_audio>",
  "format": "vtt"
}
  • audio: required, base64-encoded audio bytes
  • format: optional, "vtt" (default), "srt", or "json"

Output: the handler returns the subtitle string directly (no wrapper). RunPod puts it in the job result’s output field, so the response body looks like:

{
  "delayTime": 1119,
  "executionTime": 499,
  "id": "...",
  "output": "WEBVTT\n\n00:00:00.000 --> 00:00:00.280\nHello\n\n...",
  "status": "COMPLETED",
  "workerId": "..."
}

Use response["output"] to get the VTT/SRT/JSON string.

Errors: the handler raises exceptions (e.g. no audio, invalid base64, transcription failure). RunPod surfaces these in its error response.

Output Examples

VTT:

WEBVTT

00:00:00.000 --> 00:00:03.500
Hello, this is a test transcription.

00:00:03.500 --> 00:00:07.200
The audio is converted to text with timestamps.

SRT:

1
00:00:00,000 --> 00:00:03,500
Hello, this is a test transcription.

2
00:00:03,500 --> 00:00:07,200
The audio is converted to text with timestamps.

Environment Variables

Variable Default Description
WHISPER_MODEL base Model size
WHISPER_DEVICE cpu cpu, cuda, auto
WHISPER_COMPUTE_TYPE int8 Compute type
WHISPER_BEAM_SIZE 5 Beam size
WHISPER_VAD_SILENCE_MS 500 Min silence (ms) for segment boundaries (RunPod). Lower = more punctuation.
WHISPER_PATIENCE 1.0 Beam search patience (RunPod). Lower = more segment boundaries.

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_subtitler-0.1.16.tar.gz (291.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audio_subtitler-0.1.16-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file audio_subtitler-0.1.16.tar.gz.

File metadata

  • Download URL: audio_subtitler-0.1.16.tar.gz
  • Upload date:
  • Size: 291.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for audio_subtitler-0.1.16.tar.gz
Algorithm Hash digest
SHA256 ce4e86d131ada14f93bf7c8fd5d61ba220e38d8e2b36ec81ea3dcfa9ed277dd7
MD5 426a3eb1094a997af35fce2c635e26fc
BLAKE2b-256 b30449ab6b1cbfef5fdb2ce4e8e275dbb34a5a950649ec46d13a1aad8b0979bd

See more details on using hashes here.

File details

Details for the file audio_subtitler-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: audio_subtitler-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for audio_subtitler-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 f3eebf8b9a9b043ff4133bc44d0f558e46fe0e3f9279463f826bfbad5966bb34
MD5 b1a0f7e2813617ddbfa8076eee4eccb1
BLAKE2b-256 33fb502313ff347f18a3c646d02af098f4db30ea1aabd81b6a58cd68ff4d45f5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page