Generate and translate subtitles from audio/video files using Whisper.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

asub

Generate and translate subtitles from audio or video files — one by one or in folders — powered by faster-whisper and deep-translator.

Features

Fast transcription — up to 4× faster than OpenAI Whisper with the same accuracy, using CTranslate2.
Automatic language detection — or specify the source language manually.
Folder batch processing — process every supported media file in a folder while loading the Whisper model only once.
Translation — translate subtitles to 100+ languages via Google Translate (free, no API key).
Speaker diarization — optional WhisperX-powered speaker labels such as [SPEAKER_00].
Multiple output formats — SRT and WebVTT.
VAD filtering — Silero VAD removes silence and reduces hallucination.
Model choice — from tiny (fast, less accurate) to large-v3 (slow, most accurate).
CPU & GPU — works on both, with int8 quantisation for low-memory setups.
Packagable as .exe — single-file Windows executable via PyInstaller.

Installation

From source (recommended for development)

git clone https://github.com/simoneraffaelli/subtitle-generator.git
cd subtitle-generator
pip install -e ".[dev]"

To enable speaker diarization during development:

pip install -e ".[dev,diarization]"

From PyPI (once published)

pip install asub

For speaker diarization:

pip install "asub[diarization]"

Quick start

# Transcribe a video and generate subtitles (auto-detect language)
asub video.mp4

# Process every supported media file in a folder
asub recordings/

# Use a specific model and output format
asub video.mp4 -m large-v3 -f vtt

# Transcribe and translate to Italian
asub video.mp4 -t it

# Transcribe with anonymous speaker labels
asub interview.wav --diarize --hf-token hf_your_token_here

# Improve speaker counting when you know there are exactly two speakers
asub interview.wav --diarize --hf-token hf_your_token_here --speakers 2

# Batch-process a folder and write all subtitles into one output directory
asub recordings/ -o subtitles/ -t de

# Specify source language, translate to German, verbose output
asub podcast.mp3 -l en -t de --verbose

# Use CPU with int8 quantisation
asub interview.wav --device cpu --compute-type int8

Folder input

When input points to a folder, asub switches to batch mode.

Only the top level of the folder is scanned. Nested subfolders are not processed.
Supported input extensions in batch mode are: .aac, .aiff, .avi, .flac, .m4a, .m4v, .mkv, .mov, .mp3, .mp4, .mpeg, .mpg, .oga, .ogg, .opus, .wav, .webm, .wma.
Without -o/--output, subtitle files are written next to each media file.
With -o/--output, the value is treated as an output directory, not a single subtitle file path.
The Whisper model is loaded once and reused across the whole batch.
If -l/--language is omitted, language detection happens per file. Mixed-language folders are supported, and translation uses each file's detected source language.
If a file's detected language already matches -t/--translate, translation is skipped for that file.
If one file fails, asub continues with the rest, then prints a summary. The process exits with code 1 if any file failed.
If multiple input files would produce the same subtitle path (for example clip.mp3 and clip.wav), asub stops before processing and asks you to resolve the naming collision.

CLI reference

usage: asub [-h] [-o OUTPUT] [-f {srt,vtt}] [-m MODEL] [--device {auto,cpu,cuda}]
                 [--compute-type TYPE] [-l LANG] [--no-vad] [--diarize]
                 [--hf-token HF_TOKEN] [--speakers N] [--min-speakers N]
                 [--max-speakers N] [--diarization-batch-size N] [-t LANG]
                 [-v] [--version] [--list-languages]
                 input

positional arguments:
  input                 Path to an audio/video file, or a folder containing media files.

options:
  -o, --output          Output subtitle file path for a single input file, or an output directory when the input is a folder.
  -f, --format          Subtitle format: srt, vtt
  -v, -verbose, --verbose
                        Show dependency warnings/logs (-v INFO, -vv DEBUG).
                        Default output hides warning-level dependency chatter.
  --version             Show version and exit
  --list-languages      Print supported translation languages and exit

transcription:
  -m, --model           Whisper model size (default: medium)
  --device              auto | cpu | cuda (default: auto)
  --compute-type        Quantisation type (auto-selected if omitted)
  -l, --language        Source language code (auto-detected if omitted)
  --no-vad              Disable Voice Activity Detection

speaker diarization:
  --diarize             Detect anonymous speaker labels and prefix subtitle text.
  --hf-token TOKEN      Hugging Face token for pyannote models (or set HF_TOKEN).
  --speakers N          Known exact number of speakers.
  --min-speakers N      Minimum expected number of speakers.
  --max-speakers N      Maximum expected number of speakers.
  --diarization-batch-size N
                        WhisperX batch size for diarized transcription.

translation:
  -t, --translate LANG  Translate subtitles to this language code

Speaker diarization

Diarization is optional because it adds heavier ML dependencies. Install the extra, accept the pyannote speaker-diarization model terms on Hugging Face, and pass --diarize.

Speaker labels are anonymous IDs, not real names. Output cues are prefixed in plain text for broad player compatibility:

[SPEAKER_00] Hello, thanks for joining.
[SPEAKER_01] Happy to be here.

If you know the speaker count, pass --speakers N. If you only know a range, use --min-speakers N and/or --max-speakers N. Diarization is not perfect, and overlapping speech is especially difficult.

By default, asub hides third-party warning/log chatter from WhisperX, PyTorch, Hugging Face, and pyannote. Pass --verbose to show those messages while troubleshooting, or -vv for debug logging.

Python API

from asub.transcriber import load_model, transcribe
from asub.translator import translate_segments
from asub.subtitle import write_subtitle_file, SubtitleFormat

# 1. Transcribe
model = load_model("medium", device="auto")
result = transcribe(model, "video.mp4")

# 2. Translate (optional)
translated = translate_segments(result.segments, source=result.language, target="it")

# 3. Write subtitle file
write_subtitle_file(translated, "video_it.srt")

Building a Windows .exe

pip install ".[dev]"
pyinstaller asub.spec

The executable will be in dist/asub.exe.

Note: The .exe does not bundle Whisper model weights. Models are downloaded on first run and cached in the default Hugging Face cache directory.

Hugging Face token (optional)

On first run, Whisper model weights are downloaded from the Hugging Face Hub. Without authentication you may see this warning:

You are sending unauthenticated requests to the HF Hub. Please set a HF_TOKEN to enable higher rate limits and faster downloads

This is not an error — the download still works, just at lower rate limits. To silence the warning and get faster downloads:

Create a free account at https://huggingface.co.
Go to Settings → Access Tokens and generate a token.
Set the token before running asub:

# Linux / macOS
export HF_TOKEN="hf_your_token_here"

# Windows PowerShell
$env:HF_TOKEN = "hf_your_token_here"

To make this permanent, add the variable to your shell profile or set it via System → Environment Variables on Windows.

For --diarize, a token is required unless HF_TOKEN is already set, and you must accept the pyannote speaker-diarization model terms in your Hugging Face account.

Available models

Model	Parameters	Relative speed	VRAM
`tiny`	39 M	~10×	~1 GB
`base`	74 M	~7×	~1 GB
`small`	244 M	~4×	~2 GB
`medium`	769 M	~2×	~5 GB
`large-v3`	1550 M	1×	~10 GB
`turbo`	809 M	~8×	~6 GB
`distil-large-v3`	756 M	~6×	~6 GB

Choosing the right model

Not every model is the best choice for every situation. Here's a breakdown to help you pick:

tiny — Fastest model by far. Good for quick previews or testing your pipeline. Accuracy is noticeably lower, especially on non-English audio or noisy recordings. Use it when speed matters more than quality.
base — A small step up from tiny. Slightly more accurate, still very fast. Suitable for clear speech in common languages.
small — A solid mid-range option. Handles most languages well and runs comfortably on CPU. Good balance for everyday use when you don't have a GPU.
medium — The default. Significantly more accurate than small, especially for accented speech, niche languages, and overlapping speakers. Slower on CPU, but a great choice with a GPU.
large-v3 — The most accurate model. Best for professional-quality subtitles, rare languages, or heavily accented audio. Requires a CUDA GPU with at least 10 GB VRAM for practical use.
turbo — Near large-v3 accuracy at roughly 8× the speed. This is the best "quality per second" option if you have a GPU with ≥6 GB VRAM.
distil-large-v3 — A distilled version of large-v3. Similar accuracy on English, slightly worse on other languages. Fast and memory-efficient. Best for English-heavy workloads on a GPU.

Recommended commands

Fastest result — use tiny when you just need a rough draft quickly:

asub video.mp4 -m tiny

Best result — use large-v3 (GPU required) for maximum accuracy:

asub video.mp4 -m large-v3

Best compromise — use turbo on GPU for near-best accuracy at high speed, or small on CPU for a good quality-to-speed ratio:

# With a CUDA GPU (recommended)
asub video.mp4 -m turbo

# CPU only
asub video.mp4 -m small

Tip: The device and compute type are auto-detected. If you have a CUDA GPU, asub will use it with float16 automatically. On CPU it falls back to int8 quantisation.

Batch-mode notes

Batch mode is sequential by design. This keeps GPU/CPU memory use stable and makes per-file progress easier to understand.
In mixed-language folders, auto-detection may produce different source languages across files. If you need consistent source-language handling, pass -l/--language explicitly.
Translation uses Google Translate through deep-translator, so large batches can still hit network or rate-limit issues. Failures are reported per file in the final summary.

Upgrading dependencies

pip install --upgrade faster-whisper deep-translator

With diarization enabled:

pip install --upgrade "asub[diarization]"

Contributing

Fork the repo and create a feature branch.
Install dev dependencies: pip install -e ".[dev]"
Run tests: python -m pytest
Lint: ruff check src/ tests/
Open a pull request.

License

MIT

Acknowledgements

Built with the great help of Claude Opus 4.6 by Anthropic.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

simoneraffaelli

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.6

Jul 2, 2026

1.0.5

Jul 2, 2026

1.0.4

Jul 2, 2026

1.0.3

Jul 2, 2026

1.0.2

Jun 29, 2026

1.0.1

Apr 21, 2026

1.0.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asub-1.0.6.tar.gz (30.9 kB view details)

Uploaded Jul 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

asub-1.0.6-py3-none-any.whl (22.8 kB view details)

Uploaded Jul 2, 2026 Python 3

File details

Details for the file asub-1.0.6.tar.gz.

File metadata

Download URL: asub-1.0.6.tar.gz
Upload date: Jul 2, 2026
Size: 30.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asub-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`0a76d4a70d0c8a2831880096a4b480b81759b5e1da076c8757d4e48df73c7550`
MD5	`b8b66aca416ebbee7b95f437ff6d39f3`
BLAKE2b-256	`d98ab3451f79a6d16ecd571dd3cb511513f9397ce6c134b4f602e1627e1a64e8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for asub-1.0.6.tar.gz:

Publisher: python-publish.yml on simoneraffaelli/subtitle-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: asub-1.0.6.tar.gz
- Subject digest: 0a76d4a70d0c8a2831880096a4b480b81759b5e1da076c8757d4e48df73c7550
- Sigstore transparency entry: 2047921818
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: simoneraffaelli/subtitle-generator@0cf8a1b8d04b93dc3291fca2291805557ab8c48c
- Branch / Tag: refs/tags/1.0.6
- Owner: https://github.com/simoneraffaelli
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@0cf8a1b8d04b93dc3291fca2291805557ab8c48c
- Trigger Event: release

File details

Details for the file asub-1.0.6-py3-none-any.whl.

File metadata

Download URL: asub-1.0.6-py3-none-any.whl
Upload date: Jul 2, 2026
Size: 22.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asub-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e450048b95db835efc2a98d0e14043085800d850290ca7f4f11c33a52ef7757e`
MD5	`fa98567a77b8417a1bcbba60e60b7cf4`
BLAKE2b-256	`20cc7f2c0c100ce511982715cc3acf2fa69abd4e425787d2190b00c1ddbd9b6d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for asub-1.0.6-py3-none-any.whl:

Publisher: python-publish.yml on simoneraffaelli/subtitle-generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: asub-1.0.6-py3-none-any.whl
- Subject digest: e450048b95db835efc2a98d0e14043085800d850290ca7f4f11c33a52ef7757e
- Sigstore transparency entry: 2047921832
- Sigstore integration time: Jul 2, 2026
Source repository:
- Permalink: simoneraffaelli/subtitle-generator@0cf8a1b8d04b93dc3291fca2291805557ab8c48c
- Branch / Tag: refs/tags/1.0.6
- Owner: https://github.com/simoneraffaelli
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@0cf8a1b8d04b93dc3291fca2291805557ab8c48c
- Trigger Event: release

asub 1.0.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

asub

Features

Installation

From source (recommended for development)

From PyPI (once published)

Quick start

Folder input

CLI reference

Speaker diarization

Python API

Building a Windows .exe

Hugging Face token (optional)

Available models

Choosing the right model

Recommended commands

Batch-mode notes

Upgrading dependencies

Contributing

License

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance