Skip to main content

Generate and translate subtitles from audio/video files using Whisper.

Project description

asub

Generate and translate subtitles from audio or video files — one by one or in folders — powered by faster-whisper and deep-translator.

Features

  • Fast transcription — up to 4× faster than OpenAI Whisper with the same accuracy, using CTranslate2.
  • Automatic language detection — or specify the source language manually.
  • Folder batch processing — process every supported media file in a folder while loading the Whisper model only once.
  • Translation — translate subtitles to 100+ languages via Google Translate (free, no API key).
  • Multiple output formats — SRT and WebVTT.
  • VAD filtering — Silero VAD removes silence and reduces hallucination.
  • Model choice — from tiny (fast, less accurate) to large-v3 (slow, most accurate).
  • CPU & GPU — works on both, with int8 quantisation for low-memory setups.
  • Packagable as .exe — single-file Windows executable via PyInstaller.

Installation

From source (recommended for development)

git clone https://github.com/simoneraffaelli/subtitle-generator.git
cd subtitle-generator
pip install -e ".[dev]"

From PyPI (once published)

pip install asub

Quick start

# Transcribe a video and generate subtitles (auto-detect language)
asub video.mp4

# Process every supported media file in a folder
asub recordings/

# Use a specific model and output format
asub video.mp4 -m large-v3 -f vtt

# Transcribe and translate to Italian
asub video.mp4 -t it

# Batch-process a folder and write all subtitles into one output directory
asub recordings/ -o subtitles/ -t de

# Specify source language, translate to German, verbose output
asub podcast.mp3 -l en -t de -v

# Use CPU with int8 quantisation
asub interview.wav --device cpu --compute-type int8

Folder input

When input points to a folder, asub switches to batch mode.

  • Only the top level of the folder is scanned. Nested subfolders are not processed.
  • Supported input extensions in batch mode are: .aac, .aiff, .avi, .flac, .m4a, .m4v, .mkv, .mov, .mp3, .mp4, .mpeg, .mpg, .oga, .ogg, .opus, .wav, .webm, .wma.
  • Without -o/--output, subtitle files are written next to each media file.
  • With -o/--output, the value is treated as an output directory, not a single subtitle file path.
  • The Whisper model is loaded once and reused across the whole batch.
  • If -l/--language is omitted, language detection happens per file. Mixed-language folders are supported, and translation uses each file's detected source language.
  • If a file's detected language already matches -t/--translate, translation is skipped for that file.
  • If one file fails, asub continues with the rest, then prints a summary. The process exits with code 1 if any file failed.
  • If multiple input files would produce the same subtitle path (for example clip.mp3 and clip.wav), asub stops before processing and asks you to resolve the naming collision.

CLI reference

usage: asub [-h] [-o OUTPUT] [-f {srt,vtt}] [-m MODEL] [--device {auto,cpu,cuda}]
                 [--compute-type TYPE] [-l LANG] [--no-vad] [-t LANG] [-v] [--version]
                 [--list-languages]
                 input

positional arguments:
  input                 Path to an audio/video file, or a folder containing media files.

options:
  -o, --output          Output subtitle file path for a single input file, or an output directory when the input is a folder.
  -f, --format          Subtitle format: srt, vtt
  -v, --verbose         Increase verbosity (-v INFO, -vv DEBUG)
  --version             Show version and exit
  --list-languages      Print supported translation languages and exit

transcription:
  -m, --model           Whisper model size (default: medium)
  --device              auto | cpu | cuda (default: auto)
  --compute-type        Quantisation type (auto-selected if omitted)
  -l, --language        Source language code (auto-detected if omitted)
  --no-vad              Disable Voice Activity Detection

translation:
  -t, --translate LANG  Translate subtitles to this language code

Python API

from asub.transcriber import load_model, transcribe
from asub.translator import translate_segments
from asub.subtitle import write_subtitle_file, SubtitleFormat

# 1. Transcribe
model = load_model("medium", device="auto")
result = transcribe(model, "video.mp4")

# 2. Translate (optional)
translated = translate_segments(result.segments, source=result.language, target="it")

# 3. Write subtitle file
write_subtitle_file(translated, "video_it.srt")

Building a Windows .exe

pip install ".[dev]"
pyinstaller asub.spec

The executable will be in dist/asub.exe.

Note: The .exe does not bundle Whisper model weights. Models are downloaded on first run and cached in the default Hugging Face cache directory.

Hugging Face token (optional)

On first run, Whisper model weights are downloaded from the Hugging Face Hub. Without authentication you may see this warning:

You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads

This is not an error — the download still works, just at lower rate limits. To silence the warning and get faster downloads:

  1. Create a free account at https://huggingface.co.
  2. Go to Settings → Access Tokens and generate a token.
  3. Set the token before running asub:
# Linux / macOS
export HF_TOKEN="hf_your_token_here"

# Windows PowerShell
$env:HF_TOKEN = "hf_your_token_here"

To make this permanent, add the variable to your shell profile or set it via System → Environment Variables on Windows.

Available models

Model Parameters Relative speed VRAM
tiny 39 M ~10× ~1 GB
base 74 M ~7× ~1 GB
small 244 M ~4× ~2 GB
medium 769 M ~2× ~5 GB
large-v3 1550 M ~10 GB
turbo 809 M ~8× ~6 GB
distil-large-v3 756 M ~6× ~6 GB

Choosing the right model

Not every model is the best choice for every situation. Here's a breakdown to help you pick:

  • tiny — Fastest model by far. Good for quick previews or testing your pipeline. Accuracy is noticeably lower, especially on non-English audio or noisy recordings. Use it when speed matters more than quality.
  • base — A small step up from tiny. Slightly more accurate, still very fast. Suitable for clear speech in common languages.
  • small — A solid mid-range option. Handles most languages well and runs comfortably on CPU. Good balance for everyday use when you don't have a GPU.
  • medium — The default. Significantly more accurate than small, especially for accented speech, niche languages, and overlapping speakers. Slower on CPU, but a great choice with a GPU.
  • large-v3 — The most accurate model. Best for professional-quality subtitles, rare languages, or heavily accented audio. Requires a CUDA GPU with at least 10 GB VRAM for practical use.
  • turbo — Near large-v3 accuracy at roughly 8× the speed. This is the best "quality per second" option if you have a GPU with ≥6 GB VRAM.
  • distil-large-v3 — A distilled version of large-v3. Similar accuracy on English, slightly worse on other languages. Fast and memory-efficient. Best for English-heavy workloads on a GPU.

Recommended commands

Fastest result — use tiny when you just need a rough draft quickly:

asub video.mp4 -m tiny

Best result — use large-v3 (GPU required) for maximum accuracy:

asub video.mp4 -m large-v3

Best compromise — use turbo on GPU for near-best accuracy at high speed, or small on CPU for a good quality-to-speed ratio:

# With a CUDA GPU (recommended)
asub video.mp4 -m turbo

# CPU only
asub video.mp4 -m small

Tip: The device and compute type are auto-detected. If you have a CUDA GPU, asub will use it with float16 automatically. On CPU it falls back to int8 quantisation.

Batch-mode notes

  • Batch mode is sequential by design. This keeps GPU/CPU memory use stable and makes per-file progress easier to understand.
  • In mixed-language folders, auto-detection may produce different source languages across files. If you need consistent source-language handling, pass -l/--language explicitly.
  • Translation uses Google Translate through deep-translator, so large batches can still hit network or rate-limit issues. Failures are reported per file in the final summary.

Upgrading dependencies

pip install --upgrade faster-whisper deep-translator

Contributing

  1. Fork the repo and create a feature branch.
  2. Install dev dependencies: pip install -e ".[dev]"
  3. Run tests: python -m pytest
  4. Lint: ruff check src/ tests/
  5. Open a pull request.

License

MIT

Acknowledgements

Built with the great help of Claude Opus 4.6 by Anthropic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asub-1.0.1.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asub-1.0.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file asub-1.0.1.tar.gz.

File metadata

  • Download URL: asub-1.0.1.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asub-1.0.1.tar.gz
Algorithm Hash digest
SHA256 6e4cdba4d7a533c138453b0573881bd97070875484b7c96fe838ebe1acb88e91
MD5 9165c472c6a55b56d15e551ccd4612f0
BLAKE2b-256 a138de87f4b7f4c92866a285896209b96339a5cae23830aedb77afca74510c29

See more details on using hashes here.

Provenance

The following attestation bundles were made for asub-1.0.1.tar.gz:

Publisher: python-publish.yml on simoneraffaelli/subtitle-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asub-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: asub-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asub-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 50af213ba4152f2bcc8953243c1692f9691ae66df89930dc7935fe09510f6e16
MD5 a8667f8ed1b2f20bf2af7771929ae598
BLAKE2b-256 1866151df74518e8d53ebd99bc019ea39c1d50387fc3f6d477339e290d69669b

See more details on using hashes here.

Provenance

The following attestation bundles were made for asub-1.0.1-py3-none-any.whl:

Publisher: python-publish.yml on simoneraffaelli/subtitle-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page