AI-powered subtitle generator using Whisper and FFmpeg
Project description
vid2cc-AI 🎙️🎬
vid2cc-AI is a high-performance CLI tool designed to bridge the gap between raw video and accessible content. By leveraging OpenAI's Whisper models and FFmpeg's robust media handling, it automates the creation of perfectly synced .srt subtitles.
Table of contents
- 🚀 Key Features
- ⚙️ Installation
- 📖 How To Use
- ☁️ Run on Google Colab (with UI)
- 🧪 Testing
- 🗺️ Roadmap
- 🛠️ Tech Stack
- 📄 License
🚀 Key Features
- AI-Driven Transcription: Powered by OpenAI Whisper for industry-leading accuracy.
- Hardware Acceleration: Automatic CUDA detection for GPU-accelerated processing.
- Intelligent Pre-processing: FFmpeg-based audio extraction optimized for speech recognition (16kHz Mono).
- Professional Packaging: Fully installable via pip with a dedicated command-line entry point.
⚙️ Installation
1. Prerequisite: FFmpeg
This tool requires FFmpeg to be installed on your system.
For a complete step-by-step guide on how to install FFmpeg on Windows (Winget/Choco), macOS (Homebrew), or Linux (Apt/Dnf/Pacman), please refer to the dedicated guide:
2. Install vid2cc-AI
pip install vid2cc-ai
*Install directly from the source for development:
git clone https://github.com/0xdilshan/vid2cc-AI.git
cd vid2cc-AI
pip install -e .
📖 How To Use
Once installed, the vid2cc command is available globally in your terminal.
Examples
For maximum accuracy with toggleable subs:
vid2cc example.mp4 --model large --embed
🛠️ Advanced Options
Fine-tune your output using the following flags:
| Flag | Description |
|---|---|
--model [size] |
Choose Whisper model: tiny, base, small, medium, large or turbo. |
--embed |
Soft Subtitles: Adds the SRT as a metadata track. Fast and allows users to toggle subtitles on/off in players like VLC. |
--hardcode |
Burn-in Subtitles: Permanently draws subtitles onto the video. Essential for social media (Instagram/TikTok) where players don't support SRT files. |
--output-dir or -o |
Set Output Directory: Create the destination directory if it doesn't exist and ensure all generated files (SRT, audio, and video) are saved there. |
--translate or -t |
Translate to English: Automatically translate any supported language transcription to English |
📦 Batch Processing
No need to run the command for every single file. You can pass multiple videos at once:
# Process all mp4 files in the current directory
vid2cc *.mp4 --model small --embed
# Process multiple specific files
vid2cc video1.mp4 video2.mkv video3.mov --model base --embed
📦 Usage as a Library
You can integrate vid2cc-AI directly into your Python projects:
from vid2cc_ai import Transcriber, extract_audio
# Extract and Transcribe
extract_audio("video.mp4", "audio.wav")
ts = Transcriber("base")
segments = ts.transcribe("audio.wav")
for s in segments:
print(f"[{s['start']:.2f}s] {s['text']}")
Run on Google Colab (with UI)
You can run vid2cc-ai directly in your browser using Google Colab. This version includes a friendly interface to manage your Google Drive files and transcription settings without writing code.
-
Install & Mount: Run the first cell to install
vid2cc-aiand connect your Google Drive. -
Configure UI: * Video Path: Right-click your video in the Colab file sidebar and select "Copy Path."
- Model: Choose
turboorsmallfor speed, large for accuracy. - Output: Select if you want Soft Subtitles (toggleable) or Hardcoded (burned-in).
- Model: Choose
-
Start: Click "Start Processing" and find your result in your Drive folder.
⚡ For 10x faster transcription, ensure your Colab runtime is set to GPU (Runtime > Change runtime type > T4 GPU).
🧪 Testing
# Install test dependencies
pip install pytest
# Run the test suite
pytest
🗺️ Roadmap
- Local video → SRT subtitle/ transcription
- Embed subtitles into video containers (
--embed) - Burn-in subtitles (
--hardcode) - Set custom output directory (
--output-dir) - Multilingual transcription
- Support translation to English
[ ] Transcription from YouTube/Vimeo URLs (yt-dlp)- Google Colab notebook support
🛠️ Tech Stack
- Inference: OpenAI Whisper
- Media Engine: FFmpeg
- Core: Python 3.9+, PyTorch
- CLI Framework: Argparse
📄 License
Distributed under the MIT License.
See LICENSE for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vid2cc_ai-0.1.5.tar.gz.
File metadata
- Download URL: vid2cc_ai-0.1.5.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e818d1d562d8fb88c35d8ed2a7cbb5ca4076baf11e8c322a4a30845ebfd00da0
|
|
| MD5 |
e7b6a091cab6e0d564140863d7dd298a
|
|
| BLAKE2b-256 |
66110a834961b3b664f1a7160f3d58223d74b4fd61d7eadc934ece5af4ef89fe
|
Provenance
The following attestation bundles were made for vid2cc_ai-0.1.5.tar.gz:
Publisher:
publish.yml on 0xdilshan/vid2cc-AI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vid2cc_ai-0.1.5.tar.gz -
Subject digest:
e818d1d562d8fb88c35d8ed2a7cbb5ca4076baf11e8c322a4a30845ebfd00da0 - Sigstore transparency entry: 969329073
- Sigstore integration time:
-
Permalink:
0xdilshan/vid2cc-AI@e5f078544556cf73bb3e5fd7df924c65a6c38245 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/0xdilshan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e5f078544556cf73bb3e5fd7df924c65a6c38245 -
Trigger Event:
release
-
Statement type:
File details
Details for the file vid2cc_ai-0.1.5-py3-none-any.whl.
File metadata
- Download URL: vid2cc_ai-0.1.5-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccf6d5314a91c9e2146ba92cd629136c9957d2c90632018736ed8d7d567a4abd
|
|
| MD5 |
ea6d92fcca6d5fc9c02ec5be804ac995
|
|
| BLAKE2b-256 |
d26d3ff38d4c813be177b6d2f2f32d54692588e36c1cbb477120117d16cd3573
|
Provenance
The following attestation bundles were made for vid2cc_ai-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on 0xdilshan/vid2cc-AI
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vid2cc_ai-0.1.5-py3-none-any.whl -
Subject digest:
ccf6d5314a91c9e2146ba92cd629136c9957d2c90632018736ed8d7d567a4abd - Sigstore transparency entry: 969329076
- Sigstore integration time:
-
Permalink:
0xdilshan/vid2cc-AI@e5f078544556cf73bb3e5fd7df924c65a6c38245 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/0xdilshan
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e5f078544556cf73bb3e5fd7df924c65a6c38245 -
Trigger Event:
release
-
Statement type: