Skip to main content

Generate SRT subtitles from video/audio files using Whisper

Project description

subtutu

subtutu is a command-line tool that automatically generates SRT subtitle files from any video or audio file. It uses faster-whisper — a high-performance reimplementation of OpenAI Whisper — to transcribe spoken audio into accurate, timestamped subtitles up to 4x faster than the original Whisper on CPU.

No API key required. Everything runs locally on your machine.

subtutu lecture.mp4
# → lecture.srt

Who is this for?

  • Content creators who want subtitles for YouTube videos, reels, or podcasts
  • Developers building subtitle pipelines
  • Researchers transcribing interviews or recordings
  • Anyone who needs fast, offline, accurate subtitles from a video file

Features

  • Generates standard .srt subtitle files ready for use in any video editor or player
  • Powered by faster-whisper (CTranslate2) — 4x faster than openai-whisper on CPU, up to 12x on GPU
  • Shows real-time transcription progress as segments are decoded
  • Auto-selects the best model for your hardware (RAM + VRAM aware)
  • Shows estimated processing time and accuracy for each model before starting
  • Supports 99+ languages with automatic language detection
  • Handles MP4, MOV, MKV, AVI, MP3, WAV, M4A, and any format ffmpeg can read
  • Clear error messages for common problems (missing ffmpeg, no audio track, silent video, etc.)

Requirements

  • Python 3.9+
  • ffmpeg — required for audio decoding

Install ffmpeg on macOS:

brew install ffmpeg

Install ffmpeg on Ubuntu/Debian:

sudo apt install ffmpeg

Installation

pip install subtutu

No separate PyTorch install needed — subtutu uses CTranslate2 for inference.


Usage

subtutu <video_or_audio_file> [options]

The subtitle file is written to the same directory as the input file by default. If a .srt already exists, a new file is created automatically (video_1.srt, video_2.srt, etc.).

subtutu video.mp4
# Output: video.srt

Options

Flag Default Description
--model auto Model: tiny, base, small, medium, large-v3, turbo, or auto to pick based on hardware
--language en Language code (e.g. en, fr, de, ja, zh). Use auto to detect automatically
--output alongside input Output .srt path or directory
--device auto Force compute device: cpu or cuda

Examples

# Subtitle an English video (default)
subtutu interview.mp4

# Use a more accurate model
subtutu documentary.mp4 --model medium

# Auto-detect the spoken language
subtutu foreign_film.mp4 --language auto

# Subtitle a French video
subtutu podcast.mp3 --language fr

# Save the subtitle file to a specific location
subtutu recording.mov --output ~/Desktop/recording.srt

Choosing a model

When --model auto is used (the default), subtutu checks your available RAM and GPU memory, then shows a table like this before loading anything:

  Model        Accuracy    Est. time
  ────────────   ────────   ──────────
   tiny            60%         1m 2s
   base            75%         2m 5s
▶  small           85%         5m 33s
   medium          93%        16m 40s
   turbo           90%         4m 10s
   large-v3        97%        33m 20s

  Recommended: small
  Press Enter to use 'small', or type a model name:

Press Enter to accept, or type a different model name to switch.

Model Size (int8) CPU Speed Accuracy
tiny ~75 MB ~120x real-time 60%
base ~145 MB ~60x real-time 75%
small ~490 MB ~24x real-time 85%
medium ~1.5 GB ~8x real-time 93%
turbo ~810 MB ~30x real-time 90%
large-v3 ~3 GB ~4x real-time 97%

Models are downloaded on first use and cached in ~/.cache/huggingface/hub/.


Supported file formats

Any format that ffmpeg can decode, including:

mp4 mov mkv avi webm flv m4v mp3 wav m4a aac ogg flac wma


Troubleshooting

ffmpeg not found Install ffmpeg — see Requirements above.

No speech detected Try --language auto if the video is not in English. Check that the video actually has an audio track.

Not enough memory to load the model Switch to a smaller model: --model small or --model tiny.

Permission denied reading a file on macOS Terminal may need Full Disk Access: System Settings > Privacy & Security > Full Disk Access.


License

MIT


Acknowledgements

Built on faster-whisper by SYSTRAN. Whisper models by OpenAI. Audio decoding by ffmpeg.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtutu-0.1.0.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subtutu-0.1.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file subtutu-0.1.0.tar.gz.

File metadata

  • Download URL: subtutu-0.1.0.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for subtutu-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a615af3a96bd84575f72e85d18f298956f3ef7582ff73b14062987b3c59084a3
MD5 246c43fcf742cbce988d4500131129ca
BLAKE2b-256 ed6284cc2933f63b40bd8a8729e1aab4a5bd171fa4c0523aecc6dd925bbc9e90

See more details on using hashes here.

File details

Details for the file subtutu-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: subtutu-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.0

File hashes

Hashes for subtutu-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2d56b3200f93d24ea24ebaf84d8c56906a2c19416cd50ba113164301d4b02357
MD5 16392e44f00e55e612ce7e74b1a94325
BLAKE2b-256 a41535ba6b1d2aea1295c346d86d9cdf48a84b5ed265eaa6b9c9717b2287dfaa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page