Skip to main content

Generate and translate subtitles from audio/video files using Whisper.

Project description

asub

Generate and translate subtitles from any audio or video file — powered by faster-whisper and deep-translator.

Features

  • Fast transcription — up to 4× faster than OpenAI Whisper with the same accuracy, using CTranslate2.
  • Automatic language detection — or specify the source language manually.
  • Translation — translate subtitles to 100+ languages via Google Translate (free, no API key).
  • Multiple output formats — SRT and WebVTT.
  • VAD filtering — Silero VAD removes silence and reduces hallucination.
  • Model choice — from tiny (fast, less accurate) to large-v3 (slow, most accurate).
  • CPU & GPU — works on both, with int8 quantisation for low-memory setups.
  • Packagable as .exe — single-file Windows executable via PyInstaller.

Installation

From source (recommended for development)

git clone https://github.com/simoneraffaelli/subtitle-generator.git
cd subtitle-generator
pip install -e ".[dev]"

From PyPI (once published)

pip install asub

Quick start

# Transcribe a video and generate subtitles (auto-detect language)
asub video.mp4

# Use a specific model and output format
asub video.mp4 -m large-v3 -f vtt

# Transcribe and translate to Italian
asub video.mp4 -t it

# Specify source language, translate to German, verbose output
asub podcast.mp3 -l en -t de -v

# Use CPU with int8 quantisation
asub interview.wav --device cpu --compute-type int8

CLI reference

usage: asub [-h] [-o OUTPUT] [-f {srt,vtt}] [-m MODEL] [--device {auto,cpu,cuda}]
                 [--compute-type TYPE] [-l LANG] [--no-vad] [-t LANG] [-v] [--version]
                 [--list-languages]
                 input

positional arguments:
  input                 Path to an audio or video file.

options:
  -o, --output          Output subtitle file path (default: <input>.srt)
  -f, --format          Subtitle format: srt, vtt
  -v, --verbose         Increase verbosity (-v INFO, -vv DEBUG)
  --version             Show version and exit
  --list-languages      Print supported translation languages and exit

transcription:
  -m, --model           Whisper model size (default: medium)
  --device              auto | cpu | cuda (default: auto)
  --compute-type        Quantisation type (auto-selected if omitted)
  -l, --language        Source language code (auto-detected if omitted)
  --no-vad              Disable Voice Activity Detection

translation:
  -t, --translate LANG  Translate subtitles to this language code

Python API

from asub.transcriber import load_model, transcribe
from asub.translator import translate_segments
from asub.subtitle import write_subtitle_file, SubtitleFormat

# 1. Transcribe
model = load_model("medium", device="auto")
result = transcribe(model, "video.mp4")

# 2. Translate (optional)
translated = translate_segments(result.segments, source=result.language, target="it")

# 3. Write subtitle file
write_subtitle_file(translated, "video_it.srt")

Building a Windows .exe

pip install ".[dev]"
pyinstaller asub.spec

The executable will be in dist/asub.exe.

Note: The .exe does not bundle Whisper model weights. Models are downloaded on first run and cached in the default Hugging Face cache directory.

Hugging Face token (optional)

On first run, Whisper model weights are downloaded from the Hugging Face Hub. Without authentication you may see this warning:

You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads

This is not an error — the download still works, just at lower rate limits. To silence the warning and get faster downloads:

  1. Create a free account at https://huggingface.co.
  2. Go to Settings → Access Tokens and generate a token.
  3. Set the token before running asub:
# Linux / macOS
export HF_TOKEN="hf_your_token_here"

# Windows PowerShell
$env:HF_TOKEN = "hf_your_token_here"

To make this permanent, add the variable to your shell profile or set it via System → Environment Variables on Windows.

Available models

Model Parameters Relative speed VRAM
tiny 39 M ~10× ~1 GB
base 74 M ~7× ~1 GB
small 244 M ~4× ~2 GB
medium 769 M ~2× ~5 GB
large-v3 1550 M ~10 GB
turbo 809 M ~8× ~6 GB
distil-large-v3 756 M ~6× ~6 GB

Choosing the right model

Not every model is the best choice for every situation. Here's a breakdown to help you pick:

  • tiny — Fastest model by far. Good for quick previews or testing your pipeline. Accuracy is noticeably lower, especially on non-English audio or noisy recordings. Use it when speed matters more than quality.
  • base — A small step up from tiny. Slightly more accurate, still very fast. Suitable for clear speech in common languages.
  • small — A solid mid-range option. Handles most languages well and runs comfortably on CPU. Good balance for everyday use when you don't have a GPU.
  • medium — The default. Significantly more accurate than small, especially for accented speech, niche languages, and overlapping speakers. Slower on CPU, but a great choice with a GPU.
  • large-v3 — The most accurate model. Best for professional-quality subtitles, rare languages, or heavily accented audio. Requires a CUDA GPU with at least 10 GB VRAM for practical use.
  • turbo — Near large-v3 accuracy at roughly 8× the speed. This is the best "quality per second" option if you have a GPU with ≥6 GB VRAM.
  • distil-large-v3 — A distilled version of large-v3. Similar accuracy on English, slightly worse on other languages. Fast and memory-efficient. Best for English-heavy workloads on a GPU.

Recommended commands

Fastest result — use tiny when you just need a rough draft quickly:

asub video.mp4 -m tiny

Best result — use large-v3 (GPU required) for maximum accuracy:

asub video.mp4 -m large-v3

Best compromise — use turbo on GPU for near-best accuracy at high speed, or small on CPU for a good quality-to-speed ratio:

# With a CUDA GPU (recommended)
asub video.mp4 -m turbo

# CPU only
asub video.mp4 -m small

Tip: The device and compute type are auto-detected. If you have a CUDA GPU, asub will use it with float16 automatically. On CPU it falls back to int8 quantisation.

Upgrading dependencies

pip install --upgrade faster-whisper deep-translator

Contributing

  1. Fork the repo and create a feature branch.
  2. Install dev dependencies: pip install -e ".[dev]"
  3. Run tests: python -m pytest
  4. Lint: ruff check src/ tests/
  5. Open a pull request.

License

MIT

Acknowledgements

Built with the great help of Claude Opus 4.6 by Anthropic.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asub-1.0.0.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asub-1.0.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file asub-1.0.0.tar.gz.

File metadata

  • Download URL: asub-1.0.0.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asub-1.0.0.tar.gz
Algorithm Hash digest
SHA256 5eb55f455fb23bda5fa106a68004a3da6429270970104206785990b45a834d32
MD5 36cd668eff446acba39180375b0cf4f4
BLAKE2b-256 fff301f268531b02f4177d8cafdeb4e6a988b2b1bcd61985de01b32d4172c003

See more details on using hashes here.

Provenance

The following attestation bundles were made for asub-1.0.0.tar.gz:

Publisher: python-publish.yml on simoneraffaelli/subtitle-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asub-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: asub-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asub-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b638810bc160122a5c4619b539ef54590e8273f2aa14cdec621a5e4a2d7a6ceb
MD5 4536b4d7edd13708cbe0f45e1b6302db
BLAKE2b-256 b1d722087a5d9dfb411f1805226c28708e14fda442e75e9d59f392090dfb579b

See more details on using hashes here.

Provenance

The following attestation bundles were made for asub-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on simoneraffaelli/subtitle-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page