AI-Powered Subtitle Generation with Translation
Project description
AI Sub: AI-Powered Subtitle Generation with Translation
Overview
AI Sub is a command-line tool that leverages Google's Gemini models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise English and Japanese subtitles by analyzing both audio and visual cues.
Key Features:
- Multimodal Understanding: Utilizes video frames for context (e.g., identifying speakers, reading on-screen text) and audio for precise timing.
- Dual-Language Support: Generates verbatim transcriptions and translations for English and Japanese.
- Automatic Segmentation: Automatically splits long videos into smaller segments for efficient processing.
Showcase
Here's an example of subtitles generated by AI Sub:
For more examples, please visit the showcase directory.
How It Works
- Preprocessing: The input video is segmented into smaller chunks to fit within API context windows and file size limits.
- AI Processing: Each segment is sent to Google Gemini. The AI analyzes the audio for speech and the video for context, following strict prompting rules to generate subtitles.
- Compilation: Generated subtitles from all segments are merged into a final, chronologically sorted SRT file.
Installation
Prerequisites: Python 3.10 or higher.
-
Set up a Python virtual environment:
python -m venv venv source venv/bin/activate # On Windows, use `venv\Scripts\activate.bat`
-
Install AI Sub:
pip install --upgrade ai-sub
Usage
You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI.
Option 1: Using Google AI Studio API Key
-
Obtain your API Key:
- Sign in to Google AI Studio.
- Click "Create API Key".
- Copy and securely store your key. Never disclose your API key publicly.
-
Run the application:
ai-sub --ai.google.key YOUR_API_KEY --ai.model=google-gla:gemini-3-flash-preview "path/to/your/video.mp4"
Note: Replace
YOUR_API_KEYwith your actual key and"path/to/your/video.mp4"with the video file path.
Option 2: Using Gemini CLI
-
Install and Authenticate Gemini CLI:
- Install:
npm install -g @google/gemini-cli - Authenticate: Follow instructions at gemini-cli.
- Install:
-
Run the application:
ai-sub --ai.model=gemini-cli:gemini-3-pro-preview --split.re-encode.enabled=True --thread.subtitles=1 "path/to/your/video.mp4"
Important Notes for CLI Mode:
- No API key is required; the tool uses your authenticated Gemini CLI instance.
- Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk.
- Re-encoding is resource-intensive and will increase processing time.
Known Limitations
- Timestamp Accuracy: Subtitle timestamps may occasionally be inaccurate. This is an inherent characteristic of the Gemini AI model. Shorter video segments generally yield better accuracy.
- AI Hallucinations: Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.
If you encounter issues, consider re-processing specific video segments as detailed below.
Advanced: Re-processing Segments
Intermediate files are stored in a temporary directory (default: tmp_<input_file_name>). You can customize this location using the --dir.tmp flag.
To re-process a specific segment:
- Navigate to the temporary directory.
- Locate and delete the corresponding
part_XXX.jsonfile. - Re-run the script. It will automatically detect missing files and re-process only those segments.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_sub-1.6.0.tar.gz.
File metadata
- Download URL: ai_sub-1.6.0.tar.gz
- Upload date:
- Size: 26.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7d20c695f4b7b5e34a38e419af50b000d19b372acd03d823213df5310c603be
|
|
| MD5 |
b96808b4dc6681f072499ba29a709011
|
|
| BLAKE2b-256 |
477895fae796c88658f01a9f883859cbfc3baa925f5cb37af548b43ebffde08b
|
Provenance
The following attestation bundles were made for ai_sub-1.6.0.tar.gz:
Publisher:
publish.yml on FlippFuzz/ai-sub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_sub-1.6.0.tar.gz -
Subject digest:
c7d20c695f4b7b5e34a38e419af50b000d19b372acd03d823213df5310c603be - Sigstore transparency entry: 815417805
- Sigstore integration time:
-
Permalink:
FlippFuzz/ai-sub@375df61ecb1b30d91b66949b679651f8228e8cb0 -
Branch / Tag:
refs/tags/v1.6.0 - Owner: https://github.com/FlippFuzz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@375df61ecb1b30d91b66949b679651f8228e8cb0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ai_sub-1.6.0-py3-none-any.whl.
File metadata
- Download URL: ai_sub-1.6.0-py3-none-any.whl
- Upload date:
- Size: 29.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6f0d001f0aec009118d6cd0b52a577ef764d8866fdf79c3ebbe50780c0dc807
|
|
| MD5 |
fc737d85731cbc404fa6a00753ff5644
|
|
| BLAKE2b-256 |
86d8fdbbc8c81f57bf94c88842f0ee5d4c7a8f4e670a98c787025f54471907cf
|
Provenance
The following attestation bundles were made for ai_sub-1.6.0-py3-none-any.whl:
Publisher:
publish.yml on FlippFuzz/ai-sub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_sub-1.6.0-py3-none-any.whl -
Subject digest:
f6f0d001f0aec009118d6cd0b52a577ef764d8866fdf79c3ebbe50780c0dc807 - Sigstore transparency entry: 815417813
- Sigstore integration time:
-
Permalink:
FlippFuzz/ai-sub@375df61ecb1b30d91b66949b679651f8228e8cb0 -
Branch / Tag:
refs/tags/v1.6.0 - Owner: https://github.com/FlippFuzz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@375df61ecb1b30d91b66949b679651f8228e8cb0 -
Trigger Event:
release
-
Statement type: