Skip to main content

AI-powered subtitle generator using Whisper and FFmpeg

Project description

vid2cc-AI 🎙️🎬

PyPI version License: MIT Python 3.9+ Code Style: Black CI Status

vid2cc-AI is a high-performance CLI tool designed to bridge the gap between raw video and accessible content. By leveraging OpenAI's Whisper models and FFmpeg's robust media handling, it automates the creation of perfectly synced .srt subtitles.


Table of contents


🚀 Key Features

  • AI-Driven Transcription: Powered by OpenAI Whisper for industry-leading accuracy.
  • Hardware Acceleration: Automatic CUDA detection for GPU-accelerated processing.
  • Intelligent Pre-processing: FFmpeg-based audio extraction optimized for speech recognition (16kHz Mono).
  • Professional Packaging: Fully installable via pip with a dedicated command-line entry point.

⚙️ Installation

1. Prerequisite: FFmpeg

This tool requires FFmpeg to be installed on your system.

For a complete step-by-step guide on how to install FFmpeg on Windows (Winget/Choco), macOS (Homebrew), or Linux (Apt/Dnf/Pacman), please refer to the dedicated guide:

👉 FFmpeg Installation Guide

2. Install vid2cc-AI

pip install vid2cc-ai

*Install directly from the source for development:

git clone https://github.com/0xdilshan/vid2cc-AI.git
cd vid2cc-AI
pip install -e .

📖 How To Use

Once installed, the vid2cc command is available globally in your terminal.

Examples

For maximum accuracy with toggleable subs:

vid2cc example.mp4 --model large --embed

🛠️ Advanced Options

Fine-tune your output using the following flags:

Flag Description
--model [size] Choose Whisper model: tiny, base, small, medium, large or turbo.
--embed Soft Subtitles: Adds the SRT as a metadata track. Fast and allows users to toggle subtitles on/off in players like VLC.
--hardcode Burn-in Subtitles: Permanently draws subtitles onto the video. Essential for social media (Instagram/TikTok) where players don't support SRT files.
--output-dir or -o Set Output Directory: Create the destination directory if it doesn't exist and ensure all generated files (SRT, audio, and video) are saved there.
--translate or -t Translate to English: Automatically translate any supported language transcription to English

📦 Batch Processing

No need to run the command for every single file. You can pass multiple videos at once:

# Process all mp4 files in the current directory
vid2cc *.mp4 --model small --embed

# Process multiple specific files
vid2cc video1.mp4 video2.mkv video3.mov --model base --embed

📦 Usage as a Library

You can integrate vid2cc-AI directly into your Python projects:

from vid2cc_ai import Transcriber, extract_audio

# Extract and Transcribe
extract_audio("video.mp4", "audio.wav")
ts = Transcriber("base")
segments = ts.transcribe("audio.wav")

for s in segments:
    print(f"[{s['start']:.2f}s] {s['text']}")

Run on Google Colab (with UI)

You can run vid2cc-ai directly in your browser using Google Colab. This version includes a friendly interface to manage your Google Drive files and transcription settings without writing code.

  1. Open the Notebook: Open in Colab.

  2. Install & Mount: Run the first cell to install vid2cc-ai and connect your Google Drive.

  3. Configure UI: * Video Path: Right-click your video in the Colab file sidebar and select "Copy Path."

    • Model: Choose turbo or small for speed, large for accuracy.
    • Output: Select if you want Soft Subtitles (toggleable) or Hardcoded (burned-in).
  4. Start: Click "Start Processing" and find your result in your Drive folder.

⚡ For 10x faster transcription, ensure your Colab runtime is set to GPU (Runtime > Change runtime type > T4 GPU).

colab-notebook-preview


🧪 Testing

# Install test dependencies
pip install pytest

# Run the test suite
pytest

🗺️ Roadmap

  • Local video → SRT subtitle/ transcription
  • Embed subtitles into video containers (--embed)
  • Burn-in subtitles (--hardcode)
  • Set custom output directory (--output-dir)
  • Multilingual transcription
  • Support translation to English
  • [ ] Transcription from YouTube/Vimeo URLs (yt-dlp)
  • Google Colab notebook support

🛠️ Tech Stack

  • Inference: OpenAI Whisper
  • Media Engine: FFmpeg
  • Core: Python 3.9+, PyTorch
  • CLI Framework: Argparse

📄 License

Distributed under the MIT License.
See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vid2cc_ai-0.1.5.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vid2cc_ai-0.1.5-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file vid2cc_ai-0.1.5.tar.gz.

File metadata

  • Download URL: vid2cc_ai-0.1.5.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vid2cc_ai-0.1.5.tar.gz
Algorithm Hash digest
SHA256 e818d1d562d8fb88c35d8ed2a7cbb5ca4076baf11e8c322a4a30845ebfd00da0
MD5 e7b6a091cab6e0d564140863d7dd298a
BLAKE2b-256 66110a834961b3b664f1a7160f3d58223d74b4fd61d7eadc934ece5af4ef89fe

See more details on using hashes here.

Provenance

The following attestation bundles were made for vid2cc_ai-0.1.5.tar.gz:

Publisher: publish.yml on 0xdilshan/vid2cc-AI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vid2cc_ai-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: vid2cc_ai-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vid2cc_ai-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ccf6d5314a91c9e2146ba92cd629136c9957d2c90632018736ed8d7d567a4abd
MD5 ea6d92fcca6d5fc9c02ec5be804ac995
BLAKE2b-256 d26d3ff38d4c813be177b6d2f2f32d54692588e36c1cbb477120117d16cd3573

See more details on using hashes here.

Provenance

The following attestation bundles were made for vid2cc_ai-0.1.5-py3-none-any.whl:

Publisher: publish.yml on 0xdilshan/vid2cc-AI

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page