AI-Powered Subtitle Generation with Translation
Project description
AI Sub: AI-Powered Subtitle Generation with Translation
Project Overview
AI Sub is a powerful tool that leverages AI (currently Google Gemini) to produce English and Japanese subtitles for videos, translating between languages as necessary. It is primarily tested and designed for Hololive concert/cover videos, but might work on other content.
Showcase
Here's an example of subtitles generated by AI Sub:
For more examples, please visit the showcase directory.
Pros and cons of using Gemini as the AI model
Pros:
- Multimodal Context: Gemini's advanced multimodal capabilities enable it to analyze video content comprehensively, including on-screen text, for superior contextual understanding and more accurate subtitle generation.
- Cloud-Based Processing: All processing is efficiently handled on Google Gemini's infrastructure, eliminating the need for local GPUs or extensive computational resources on your machine.
Cons:
- Timestamp Precision: Subtitle timestamps may exhibit a minor offset of a few seconds.
- Network Usage: Uploading entire video files to Google's services will consume network bandwidth.
How AI Sub Works
- Video Segmentation: The input video is first segmented into 180-second segments. This duration is configurable via the
--split_secondsflag. - Concurrent Processing: Each video segment is then sent to the AI model (Google Gemini) for subtitle generation. You can adjust the number of concurrent processing threads using the
--num_processing_threadsflag to optimize performance. - Subtitle Compilation: All generated subtitle parts are then combined into a single, final subtitle file.
Getting Started: A Quick Guide
1. Obtain Your Google Gemini API Key
Follow these simple steps to acquire your API key:
- Sign in to Google AI Studio.
- Click "Create API Key."
- Copy and securely store your API key. Never disclose your API key publicly.
2. Set Up Your Python Environment (Python 3.10+ Required)
Prepare your python virtual environment:
python -m venv venv
source venv/bin/activate # On Windows, use `venv\Scripts\activate.bat`
pip install --upgrade ai-sub
3. Execute the Script
Run the application with your video file:
ai-sub --api_key=YOUR_API_KEY "path/to/your/video.mp4"
Note: Replace YOUR_API_KEY with your actual Google Gemini API key and "path/to/your/video.mp4" with the full path to your video file.
Known Limitations
-
Timestamp Accuracy: Subtitle timestamps may exhibit inaccuracies. This is an inherent characteristic of the Gemini AI model.
- Observations indicate that shorter video segments generally lead to improved timestamp accuracy.
- Requesting second-level precision for timestamps generally yields more accurate results compared to millisecond-level precision from the model. Consequently, the current implementation is designed to request second-level timestamps.
-
AI Hallucinations: Like all AI models, Gemini may occasionally produce "hallucinations" or inaccurate information. This is a known characteristic of current AI technology.
If you encounter issues related to these limitations, consider re-processing specific video segments as detailed in the "Re-processing Specific Video Segments" section below.
Re-processing Specific Video Segments
Intermediate files generated during processing are stored in the temporary directory, which defaults to tmp_<input_file_name> but can be specified using the --temp_dir CLI flag.
Users can examine these part_XXX.json files within this directory to review the AI's results for individual segments.
To re-process a specific video segment, simply delete its corresponding part_XXX.json file.
Upon subsequent execution, the script will automatically re-process only those segments for which the part_XXX.json file is absent.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_sub-0.0.6.tar.gz.
File metadata
- Download URL: ai_sub-0.0.6.tar.gz
- Upload date:
- Size: 19.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cb6058cba2b798bd4c04fec04687412ce8b82d0f484129ce9ca9ae5624269e0
|
|
| MD5 |
639438b7db6653b3b520bfeaaa4aaab3
|
|
| BLAKE2b-256 |
f7b11f9d180b5aca4eeeb54d7738d37c2c07f52107d0d6abf004a568f3748e1a
|
Provenance
The following attestation bundles were made for ai_sub-0.0.6.tar.gz:
Publisher:
publish.yml on FlippFuzz/ai-sub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_sub-0.0.6.tar.gz -
Subject digest:
3cb6058cba2b798bd4c04fec04687412ce8b82d0f484129ce9ca9ae5624269e0 - Sigstore transparency entry: 262823493
- Sigstore integration time:
-
Permalink:
FlippFuzz/ai-sub@66306f7b91010e4c8ffe3726d5d5543ea453f3dc -
Branch / Tag:
refs/tags/v0.0.6 - Owner: https://github.com/FlippFuzz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@66306f7b91010e4c8ffe3726d5d5543ea453f3dc -
Trigger Event:
release
-
Statement type:
File details
Details for the file ai_sub-0.0.6-py3-none-any.whl.
File metadata
- Download URL: ai_sub-0.0.6-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c6b4df08564186720f7cf435a031bd993dba424d8e0ec74a13862e9b615440a
|
|
| MD5 |
1b7cb4c8bfa5d9b79cab7295cf2017ca
|
|
| BLAKE2b-256 |
c3cc28fcbabbdc582a255be91bc8b10bf86950d080c7ef5dba372dfa9d0d76be
|
Provenance
The following attestation bundles were made for ai_sub-0.0.6-py3-none-any.whl:
Publisher:
publish.yml on FlippFuzz/ai-sub
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ai_sub-0.0.6-py3-none-any.whl -
Subject digest:
3c6b4df08564186720f7cf435a031bd993dba424d8e0ec74a13862e9b615440a - Sigstore transparency entry: 262823502
- Sigstore integration time:
-
Permalink:
FlippFuzz/ai-sub@66306f7b91010e4c8ffe3726d5d5543ea453f3dc -
Branch / Tag:
refs/tags/v0.0.6 - Owner: https://github.com/FlippFuzz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@66306f7b91010e4c8ffe3726d5d5543ea453f3dc -
Trigger Event:
release
-
Statement type: