Generate SRT subtitles from video/audio files using Whisper
Project description
subtutu
subtutu is a command-line tool that automatically generates SRT subtitle files from any video or audio file. It uses faster-whisper — a high-performance reimplementation of OpenAI Whisper — to transcribe spoken audio into accurate, timestamped subtitles up to 4x faster than the original Whisper on CPU.
No API key required. Everything runs locally on your machine.
subtutu lecture.mp4
# → lecture.srt
Who is this for?
- Content creators who want subtitles for YouTube videos, reels, or podcasts
- Developers building subtitle pipelines
- Researchers transcribing interviews or recordings
- Anyone who needs fast, offline, accurate subtitles from a video file
Features
- Generates standard
.srtsubtitle files ready for use in any video editor or player - Powered by faster-whisper (CTranslate2) — 4x faster than openai-whisper on CPU, up to 12x on GPU
- Shows real-time transcription progress as segments are decoded
- Auto-selects the best model for your hardware (RAM + VRAM aware)
- Shows estimated processing time and accuracy for each model before starting
- Supports 99+ languages with automatic language detection
- Handles MP4, MOV, MKV, AVI, MP3, WAV, M4A, and any format ffmpeg can read
- Clear error messages for common problems (missing ffmpeg, no audio track, silent video, etc.)
Requirements
- Python 3.9+
- ffmpeg — required for audio decoding
Install ffmpeg on macOS:
brew install ffmpeg
Install ffmpeg on Ubuntu/Debian:
sudo apt install ffmpeg
Installation
pip install subtutu
No separate PyTorch install needed — subtutu uses CTranslate2 for inference.
Usage
subtutu <video_or_audio_file> [options]
The subtitle file is written to the same directory as the input file by default. If a .srt already exists, a new file is created automatically (video_1.srt, video_2.srt, etc.).
subtutu video.mp4
# Output: video.srt
Options
| Flag | Default | Description |
|---|---|---|
--model |
auto |
Model: tiny, base, small, medium, large-v3, turbo, or auto to pick based on hardware |
--language |
en |
Language code (e.g. en, fr, de, ja, zh). Use auto to detect automatically |
--output |
alongside input | Output .srt path or directory |
--device |
auto | Force compute device: cpu or cuda |
Examples
# Subtitle an English video (default)
subtutu interview.mp4
# Use a more accurate model
subtutu documentary.mp4 --model medium
# Auto-detect the spoken language
subtutu foreign_film.mp4 --language auto
# Subtitle a French video
subtutu podcast.mp3 --language fr
# Save the subtitle file to a specific location
subtutu recording.mov --output ~/Desktop/recording.srt
Choosing a model
When --model auto is used (the default), subtutu checks your available RAM and GPU memory, then shows a table like this before loading anything:
Model Accuracy Est. time
──────────── ──────── ──────────
tiny 60% 1m 2s
base 75% 2m 5s
▶ small 85% 5m 33s
medium 93% 16m 40s
turbo 90% 4m 10s
large-v3 97% 33m 20s
Recommended: small
Press Enter to use 'small', or type a model name:
Press Enter to accept, or type a different model name to switch.
| Model | Size (int8) | CPU Speed | Accuracy |
|---|---|---|---|
tiny |
~75 MB | ~120x real-time | 60% |
base |
~145 MB | ~60x real-time | 75% |
small |
~490 MB | ~24x real-time | 85% |
medium |
~1.5 GB | ~8x real-time | 93% |
turbo |
~810 MB | ~30x real-time | 90% |
large-v3 |
~3 GB | ~4x real-time | 97% |
Models are downloaded on first use and cached in ~/.cache/huggingface/hub/.
Supported file formats
Any format that ffmpeg can decode, including:
mp4 mov mkv avi webm flv m4v mp3 wav m4a aac ogg flac wma
Troubleshooting
ffmpeg not found
Install ffmpeg — see Requirements above.
No speech detected
Try --language auto if the video is not in English. Check that the video actually has an audio track.
Not enough memory to load the model
Switch to a smaller model: --model small or --model tiny.
Permission denied reading a file on macOS
Terminal may need Full Disk Access: System Settings > Privacy & Security > Full Disk Access.
License
MIT
Acknowledgements
Built on faster-whisper by SYSTRAN. Whisper models by OpenAI. Audio decoding by ffmpeg.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file subtutu-0.1.0.tar.gz.
File metadata
- Download URL: subtutu-0.1.0.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a615af3a96bd84575f72e85d18f298956f3ef7582ff73b14062987b3c59084a3
|
|
| MD5 |
246c43fcf742cbce988d4500131129ca
|
|
| BLAKE2b-256 |
ed6284cc2933f63b40bd8a8729e1aab4a5bd171fa4c0523aecc6dd925bbc9e90
|
File details
Details for the file subtutu-0.1.0-py3-none-any.whl.
File metadata
- Download URL: subtutu-0.1.0-py3-none-any.whl
- Upload date:
- Size: 11.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d56b3200f93d24ea24ebaf84d8c56906a2c19416cd50ba113164301d4b02357
|
|
| MD5 |
16392e44f00e55e612ce7e74b1a94325
|
|
| BLAKE2b-256 |
a41535ba6b1d2aea1295c346d86d9cdf48a84b5ed265eaa6b9c9717b2287dfaa
|