Skip to main content

Transcribe media files to SRT subtitles.

Project description

Audio2Sub

Audio2Sub is a command-line tool that automatically transcribes audio from video or audio files and generates subtitles in the .srt format. It uses FFmpeg for media handling, Silero VAD for precise voice activity detection, and supports multiple transcription backends to convert speech to text.

Installation

Before installing, you must have FFmpeg installed and available in your system's PATH.

You can install Audio2Sub using pip. The default installation includes the faster_whisper backend.

pip install audio2sub[faster_whisper]

To install with a different backend, see the table in the Backends section below.

Usage

Basic Example

audio2sub my_video.mp4 -o my_video.srt --lang en

This command will transcribe the audio from my_video.mp4 into English and save the subtitles to my_video.srt.

Notes:

  • First-Time Use: The first time you run the program, it will download the necessary transcription models. This may take some time and require significant disk space.
  • CUDA: Performance significantly degraded without CUDA when using whisper-based local models. The program will raise a warning if CUDA is not available when it starts. If your system has a compatible GPU, install the CUDA Toolkit first. If you are sure CUDA has been installed correctly and still get the warning, you may need to reinstall a compatible PyTorch version manually. The reinstallation of PyTorch may break other dependencies if you choose a different version than what you currently have. In this case, you may need to reinstall those according to the warnings shown.

Using a Different Transcriber

Use the -t or --transcriber flag to select a different backend.

audio2sub my_audio.wav -o my_audio.srt --lang en -t whisper --model medium

Each transcriber has its own options. To see them, use --help with the transcriber specified.

audio2sub -t faster_whisper --help

Backends

Audio2Sub supports the following transcription backends.

Backend Name Description
faster_whisper A faster reimplementation of Whisper using CTranslate2. See Faster Whisper. This is the default backend.
whisper The original speech recognition model by OpenAI. See OpenAI Whisper.
gemini Google's Gemini model via their API. Requires a GEMINI_API_KEY environment variable or --gemini-api-key argument.

You should use pip install audio2sub[<backend>] to install the desired backend support and use the corresponding transcriber with the -t flag.

Contributing

Contributions are welcome! Please open an issue or submit a pull request on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio2sub-0.1.0.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audio2sub-0.1.0-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file audio2sub-0.1.0.tar.gz.

File metadata

  • Download URL: audio2sub-0.1.0.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for audio2sub-0.1.0.tar.gz
Algorithm Hash digest
SHA256 da542146802f081cddd9af05567f73256801460022e0f4616483bf8b3315ce12
MD5 8447a5af54023e6a1a2461a198071a3d
BLAKE2b-256 377ba1907727361c8a9bd2a2937c9afe45947f87b4875ea2e738f675070b66eb

See more details on using hashes here.

File details

Details for the file audio2sub-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: audio2sub-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for audio2sub-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a0fc50bbbf5f2eed5ec1232c913c4460ed6fbd7751014ce521523a71a08f4a2f
MD5 c6073f3bde613d82ba3bf4334d9b34c7
BLAKE2b-256 4ecd0e8856f6c780ad4a17436061275726edfddd0d4516e43c9e6176973621a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page