Generate and translate subtitles from audio/video files using Whisper.
Project description
asub
Generate and translate subtitles from audio or video files — one by one or in folders — powered by faster-whisper and deep-translator.
Features
- Fast transcription — up to 4× faster than OpenAI Whisper with the same accuracy, using CTranslate2.
- Automatic language detection — or specify the source language manually.
- Folder batch processing — process every supported media file in a folder while loading the Whisper model only once.
- Translation — translate subtitles to 100+ languages via Google Translate (free, no API key).
- Multiple output formats — SRT and WebVTT.
- VAD filtering — Silero VAD removes silence and reduces hallucination.
- Model choice — from
tiny(fast, less accurate) tolarge-v3(slow, most accurate). - CPU & GPU — works on both, with int8 quantisation for low-memory setups.
- Packagable as .exe — single-file Windows executable via PyInstaller.
Installation
From source (recommended for development)
git clone https://github.com/simoneraffaelli/subtitle-generator.git
cd subtitle-generator
pip install -e ".[dev]"
From PyPI (once published)
pip install asub
Quick start
# Transcribe a video and generate subtitles (auto-detect language)
asub video.mp4
# Process every supported media file in a folder
asub recordings/
# Use a specific model and output format
asub video.mp4 -m large-v3 -f vtt
# Transcribe and translate to Italian
asub video.mp4 -t it
# Batch-process a folder and write all subtitles into one output directory
asub recordings/ -o subtitles/ -t de
# Specify source language, translate to German, verbose output
asub podcast.mp3 -l en -t de -v
# Use CPU with int8 quantisation
asub interview.wav --device cpu --compute-type int8
Folder input
When input points to a folder, asub switches to batch mode.
- Only the top level of the folder is scanned. Nested subfolders are not processed.
- Supported input extensions in batch mode are:
.aac,.aiff,.avi,.flac,.m4a,.m4v,.mkv,.mov,.mp3,.mp4,.mpeg,.mpg,.oga,.ogg,.opus,.wav,.webm,.wma. - Without
-o/--output, subtitle files are written next to each media file. - With
-o/--output, the value is treated as an output directory, not a single subtitle file path. - The Whisper model is loaded once and reused across the whole batch.
- If
-l/--languageis omitted, language detection happens per file. Mixed-language folders are supported, and translation uses each file's detected source language. - If a file's detected language already matches
-t/--translate, translation is skipped for that file. - If one file fails, asub continues with the rest, then prints a summary. The
process exits with code
1if any file failed. - If multiple input files would produce the same subtitle path (for example
clip.mp3andclip.wav), asub stops before processing and asks you to resolve the naming collision.
CLI reference
usage: asub [-h] [-o OUTPUT] [-f {srt,vtt}] [-m MODEL] [--device {auto,cpu,cuda}]
[--compute-type TYPE] [-l LANG] [--no-vad] [-t LANG] [-v] [--version]
[--list-languages]
input
positional arguments:
input Path to an audio/video file, or a folder containing media files.
options:
-o, --output Output subtitle file path for a single input file, or an output directory when the input is a folder.
-f, --format Subtitle format: srt, vtt
-v, --verbose Increase verbosity (-v INFO, -vv DEBUG)
--version Show version and exit
--list-languages Print supported translation languages and exit
transcription:
-m, --model Whisper model size (default: medium)
--device auto | cpu | cuda (default: auto)
--compute-type Quantisation type (auto-selected if omitted)
-l, --language Source language code (auto-detected if omitted)
--no-vad Disable Voice Activity Detection
translation:
-t, --translate LANG Translate subtitles to this language code
Python API
from asub.transcriber import load_model, transcribe
from asub.translator import translate_segments
from asub.subtitle import write_subtitle_file, SubtitleFormat
# 1. Transcribe
model = load_model("medium", device="auto")
result = transcribe(model, "video.mp4")
# 2. Translate (optional)
translated = translate_segments(result.segments, source=result.language, target="it")
# 3. Write subtitle file
write_subtitle_file(translated, "video_it.srt")
Building a Windows .exe
pip install ".[dev]"
pyinstaller asub.spec
The executable will be in dist/asub.exe.
Note: The .exe does not bundle Whisper model weights. Models are downloaded on first run and cached in the default Hugging Face cache directory.
Hugging Face token (optional)
On first run, Whisper model weights are downloaded from the Hugging Face Hub. Without authentication you may see this warning:
You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads
This is not an error — the download still works, just at lower rate limits. To silence the warning and get faster downloads:
- Create a free account at https://huggingface.co.
- Go to Settings → Access Tokens and generate a token.
- Set the token before running asub:
# Linux / macOS
export HF_TOKEN="hf_your_token_here"
# Windows PowerShell
$env:HF_TOKEN = "hf_your_token_here"
To make this permanent, add the variable to your shell profile or set it via System → Environment Variables on Windows.
Available models
| Model | Parameters | Relative speed | VRAM |
|---|---|---|---|
tiny |
39 M | ~10× | ~1 GB |
base |
74 M | ~7× | ~1 GB |
small |
244 M | ~4× | ~2 GB |
medium |
769 M | ~2× | ~5 GB |
large-v3 |
1550 M | 1× | ~10 GB |
turbo |
809 M | ~8× | ~6 GB |
distil-large-v3 |
756 M | ~6× | ~6 GB |
Choosing the right model
Not every model is the best choice for every situation. Here's a breakdown to help you pick:
tiny— Fastest model by far. Good for quick previews or testing your pipeline. Accuracy is noticeably lower, especially on non-English audio or noisy recordings. Use it when speed matters more than quality.base— A small step up fromtiny. Slightly more accurate, still very fast. Suitable for clear speech in common languages.small— A solid mid-range option. Handles most languages well and runs comfortably on CPU. Good balance for everyday use when you don't have a GPU.medium— The default. Significantly more accurate thansmall, especially for accented speech, niche languages, and overlapping speakers. Slower on CPU, but a great choice with a GPU.large-v3— The most accurate model. Best for professional-quality subtitles, rare languages, or heavily accented audio. Requires a CUDA GPU with at least 10 GB VRAM for practical use.turbo— Nearlarge-v3accuracy at roughly 8× the speed. This is the best "quality per second" option if you have a GPU with ≥6 GB VRAM.distil-large-v3— A distilled version oflarge-v3. Similar accuracy on English, slightly worse on other languages. Fast and memory-efficient. Best for English-heavy workloads on a GPU.
Recommended commands
Fastest result — use tiny when you just need a rough draft quickly:
asub video.mp4 -m tiny
Best result — use large-v3 (GPU required) for maximum accuracy:
asub video.mp4 -m large-v3
Best compromise — use turbo on GPU for near-best accuracy at high speed,
or small on CPU for a good quality-to-speed ratio:
# With a CUDA GPU (recommended)
asub video.mp4 -m turbo
# CPU only
asub video.mp4 -m small
Tip: The device and compute type are auto-detected. If you have a CUDA GPU, asub will use it with
float16automatically. On CPU it falls back toint8quantisation.
Batch-mode notes
- Batch mode is sequential by design. This keeps GPU/CPU memory use stable and makes per-file progress easier to understand.
- In mixed-language folders, auto-detection may produce different source
languages across files. If you need consistent source-language handling, pass
-l/--languageexplicitly. - Translation uses Google Translate through
deep-translator, so large batches can still hit network or rate-limit issues. Failures are reported per file in the final summary.
Upgrading dependencies
pip install --upgrade faster-whisper deep-translator
Contributing
- Fork the repo and create a feature branch.
- Install dev dependencies:
pip install -e ".[dev]" - Run tests:
python -m pytest - Lint:
ruff check src/ tests/ - Open a pull request.
License
Acknowledgements
Built with the great help of Claude Opus 4.6 by Anthropic.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asub-1.0.1.tar.gz.
File metadata
- Download URL: asub-1.0.1.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e4cdba4d7a533c138453b0573881bd97070875484b7c96fe838ebe1acb88e91
|
|
| MD5 |
9165c472c6a55b56d15e551ccd4612f0
|
|
| BLAKE2b-256 |
a138de87f4b7f4c92866a285896209b96339a5cae23830aedb77afca74510c29
|
Provenance
The following attestation bundles were made for asub-1.0.1.tar.gz:
Publisher:
python-publish.yml on simoneraffaelli/subtitle-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asub-1.0.1.tar.gz -
Subject digest:
6e4cdba4d7a533c138453b0573881bd97070875484b7c96fe838ebe1acb88e91 - Sigstore transparency entry: 1354548897
- Sigstore integration time:
-
Permalink:
simoneraffaelli/subtitle-generator@3a322642a21ada3b879797124448f691f7f8189b -
Branch / Tag:
refs/tags/1.0.1 - Owner: https://github.com/simoneraffaelli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3a322642a21ada3b879797124448f691f7f8189b -
Trigger Event:
release
-
Statement type:
File details
Details for the file asub-1.0.1-py3-none-any.whl.
File metadata
- Download URL: asub-1.0.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50af213ba4152f2bcc8953243c1692f9691ae66df89930dc7935fe09510f6e16
|
|
| MD5 |
a8667f8ed1b2f20bf2af7771929ae598
|
|
| BLAKE2b-256 |
1866151df74518e8d53ebd99bc019ea39c1d50387fc3f6d477339e290d69669b
|
Provenance
The following attestation bundles were made for asub-1.0.1-py3-none-any.whl:
Publisher:
python-publish.yml on simoneraffaelli/subtitle-generator
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asub-1.0.1-py3-none-any.whl -
Subject digest:
50af213ba4152f2bcc8953243c1692f9691ae66df89930dc7935fe09510f6e16 - Sigstore transparency entry: 1354548990
- Sigstore integration time:
-
Permalink:
simoneraffaelli/subtitle-generator@3a322642a21ada3b879797124448f691f7f8189b -
Branch / Tag:
refs/tags/1.0.1 - Owner: https://github.com/simoneraffaelli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@3a322642a21ada3b879797124448f691f7f8189b -
Trigger Event:
release
-
Statement type: