AI-Powered Subtitle Generation with Translation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Project description

AI Sub: AI-Powered Subtitle Generation with Translation

Overview

AI Sub is a command-line tool that leverages Google's Gemini models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise English and Japanese subtitles by analyzing both audio and visual cues.

Key Features:

Multimodal Understanding: Utilizes video frames for context (e.g., identifying speakers, reading on-screen text) and audio for precise timing.
Dual-Language Support: Generates verbatim transcriptions and translations for English and Japanese.
Automatic Segmentation: Automatically splits long videos into smaller segments for efficient processing.

Showcase

Here's an example of subtitles generated by AI Sub:

For more examples, please visit ai-sub-showcase.

How It Works

Preprocessing: The input video is segmented into smaller chunks to fit within API context windows and file size limits.
AI Processing: Each segment is sent to Google Gemini. The AI analyzes the audio for speech and the video for context, following strict prompting rules to generate subtitles.
Compilation: Generated subtitles from all segments are merged into a final, chronologically sorted SRT file.

Installation

Prerequisites: Python 3.10 or higher.

Set up a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`

Install AI Sub:
```
pip install --upgrade ai-sub
```

Usage

You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI.

Option 1: Using Google AI Studio API Key

Obtain your API Key:
- Sign in to Google AI Studio.
- Click "Create API Key".
- Copy and securely store your key. Never disclose your API key publicly.
Run the application:
```
ai-sub --ai.google.key YOUR_API_KEY --ai.model=google-gla:gemini-3-flash-preview "path/to/your/video.mp4"
```
Note: Replace YOUR_API_KEY with your actual key and "path/to/your/video.mp4" with the video file path.

Option 2: Using Gemini CLI

Install and Authenticate Gemini CLI:
- Install: npm install -g @google/gemini-cli
- Authenticate: Follow instructions at gemini-cli.
Run the application:
```
ai-sub --ai.model=gemini-cli:gemini-3-flash-preview --split.re-encode.enabled=True "path/to/your/video.mp4"
```
Important Notes for CLI Mode:
- No API key is required; the tool uses your authenticated Gemini CLI instance.
- Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk. The default re-encoding settings are aggressive and should work for most inputs.
- Re-encoding is resource-intensive and will increase processing time.

Configuration

All settings can be configured via command-line arguments (e.g., --ai.rpm 10) or environment variables with the AISUB_ prefix (e.g., AISUB_AI_RPM=10).

AI Settings (`--ai.*`)

Argument	Description	Default
`--ai.model <model>`	A shorthand to set both `pass1_model` and `pass2_model` to the same value.	`None`
`--ai.pass1-model <model>`	The AI model for the first pass of subtitle generation. Use 'google-gla:<model>' for Google models, 'openai:<model>' for OpenAI, or 'custom:<url>' for a custom endpoint.	`google-gla:gemini-3-flash-preview`
`--ai.pass2-model <model>`	The AI model for the second pass of subtitle generation (QA & Refinement).	`google-gla:gemini-3-flash-preview`
`--ai.rpm <int>`	Maximum requests per minute for the AI model.	`4`
`--ai.tpm <int>`	Maximum tokens per minute for the AI model.	`250000`

Google AI Settings (`--ai.google.*`)

Argument	Description	Default
`--ai.google.key <key>`	The API key for Google's generative language models.	`None` (loads from `GOOGLE_API_KEY` or `GEMINI_API_KEY`)
`--ai.google.file-cache-ttl <seconds>`	The time-to-live (TTL) in seconds for the Gemini file list cache.	`10`
`--ai.google.use-files-api <bool>`	Whether to use the Gemini Files API.	`True`
`--ai.google.base-url <url>`	The base URL for the Google AI API.	`None`

Gemini CLI Settings (`--ai.gemini-cli.*`)

Argument	Description	Default
`--ai.gemini-cli.timeout <seconds>`	The timeout in seconds for Gemini CLI operations.	`600`
`--ai.gemini-cli.overwrite-system-prompt <bool>`	Whether to overwrite the system prompt using `GEMINI_SYSTEM_MD`.	`False`

Splitting Settings (`--split.*`)

Argument	Description	Default
`--split.max-seconds <seconds>`	The maximum duration in seconds for each video chunk.	`300`
`--split.start-offset-min <minutes>`	The number of minutes to skip from the beginning of the video.	`0`

Re-Encode Settings (`--split.re-encode.*`)

Argument	Description	Default
`--split.re-encode.enabled <bool>`	Re-encode the video chunks to save bandwidth.	`False`
`--split.re-encode.fps <int>`	The framerate to re-encode the video to.	`1`
`--split.re-encode.height <int>`	The height (resolution) to re-encode the video to.	`360`
`--split.re-encode.bitrate-kb <int>`	The bitrate in KB/s to re-encode the video to.	`35`
`--split.re-encode.threshold-mb <int>`	The threshold in MB for re-encoding. Files smaller than this will not be re-encoded. Set to 0 to re-encode everything.	`20`
`--split.re-encode.encoder <encoder>`	The specific encoder to use (e.g., 'h264_nvenc').	`None` (auto-detected)

Directory Settings (`--dir.*`)

Argument	Description	Default
`--dir.tmp <path>`	Temporary directory for intermediate files.	`tmp_<video_name>` in output dir
`--dir.out <path>`	Output directory for the final subtitle files.	Same directory as input video

Concurrency Settings (`--thread.*`)

Argument	Description	Default
`--thread.uploads <int>`	The number of concurrent threads for uploading video segments. This is only used for Gemini (google-gla) models.	`4`
`--thread.re-encode <int>`	The number of concurrent threads for re-encoding video chunks.	`2`
`--thread.subtitles1 <int>`	The number of concurrent threads to use for Pass 1 (Transcription).	`4`
`--thread.subtitles2 <int>`	The number of concurrent threads to use for Pass 2 (QA).	`4`

Retry Settings (`--retry.*`)

Argument	Description	Default
`--retry.run <int>`	The maximum number of times to retry a failed job in this run of the program.	`3`
`--retry.max <int>`	The absolute maximum number of times a job can be retried in total.	`9`
`--retry.delay <seconds>`	The number of seconds to wait between retries.	`30`

Logging Settings (`--log.*`)

Argument	Description	Default
`--log.level <level>`	The minimum log level to display.	`info`
`--log.timestamps <bool>`	Whether to include timestamps in the console output.	`False`
`--log.scrub <bool>`	Whether to scrub sensitive data from logs.	`True`

Known Limitations

Timestamp Accuracy: Subtitle timestamps may occasionally be inaccurate. This is an inherent characteristic of the Gemini AI model. Shorter video segments generally yield better accuracy.
AI Hallucinations: Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.

If you encounter issues, consider re-processing specific video segments as detailed below.

Advanced: Re-processing Segments

Intermediate files are stored in a temporary directory (default: tmp_<input_file_name>). You can customize this location using the --dir.tmp flag.

To re-process a specific segment:

Navigate to the temporary directory.
Locate and delete the corresponding part_XXX.model_name.json file.
Re-run the script. It will automatically detect missing files and re-process only those segments.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FlippFuzz

Release history Release notifications | RSS feed

2.9.1

May 20, 2026

2.9.0

May 20, 2026

2.9.0b1 pre-release

May 20, 2026

2.8.2

Apr 26, 2026

2.8.1

Apr 20, 2026

2.8.0

Apr 12, 2026

2.8.0b2 pre-release

Apr 11, 2026

2.8.0b1 pre-release

Apr 10, 2026

2.7.1b5 pre-release

Apr 5, 2026

2.7.1b4 pre-release

Apr 5, 2026

2.7.1b3 pre-release

Apr 5, 2026

2.7.1b2 pre-release

Apr 4, 2026

2.7.1b1 pre-release

Apr 4, 2026

2.7.0

Apr 4, 2026

2.6.1

Apr 2, 2026

2.6.1b3 pre-release

Mar 30, 2026

2.6.1b2 pre-release

Mar 30, 2026

2.6.1b1 pre-release

Mar 29, 2026

2.6.0

Mar 29, 2026

2.6.0b1 pre-release

Mar 29, 2026

2.5.0

Mar 21, 2026

2.5.0b1 pre-release

Mar 20, 2026

2.4.1

Mar 18, 2026

2.4.0

Mar 18, 2026

2.4.0b5 pre-release

Mar 18, 2026

2.4.0b4 pre-release

Mar 17, 2026

2.4.0b3 pre-release

Mar 17, 2026

2.4.0b2 pre-release

Mar 17, 2026

2.4.0b1 pre-release

Mar 15, 2026

2.3.1

Mar 14, 2026

2.3.0

Mar 11, 2026

2.2.0b1 pre-release

Mar 10, 2026

2.1.0

Mar 8, 2026

2.1.0b3 pre-release

Mar 8, 2026

2.1.0b2 pre-release

Mar 8, 2026

2.1.0b1 pre-release

Mar 8, 2026

This version

2.0.0

Mar 7, 2026

1.12.0

Mar 2, 2026

1.11.0

Feb 28, 2026

1.10.1

Feb 25, 2026

1.10.0

Feb 23, 2026

1.9.2

Feb 6, 2026

1.9.1

Feb 1, 2026

1.9.0

Feb 1, 2026

1.8.0

Feb 1, 2026

1.7.0

Jan 19, 2026

1.6.2

Jan 15, 2026

1.6.1

Jan 14, 2026

1.6.0

Jan 13, 2026

1.5.0

Jan 10, 2026

1.4.0

Jan 3, 2026

1.3.1

Dec 31, 2025

1.3.0

Dec 31, 2025

1.2.2

Dec 26, 2025

1.2.1

Dec 26, 2025

1.2.0

Dec 26, 2025

1.1.0

Dec 26, 2025

1.0.2

Dec 26, 2025

1.0.1

Dec 25, 2025

1.0.0

Dec 25, 2025

0.0.8

Nov 2, 2025

0.0.7

Jul 9, 2025

0.0.6

Jul 4, 2025

0.0.5

Jul 3, 2025

0.0.4

Jul 1, 2025

0.0.3

Jun 30, 2025

0.0.2

Jun 29, 2025

0.0.1

Jun 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_sub-2.0.0.tar.gz (32.0 kB view details)

Uploaded Mar 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_sub-2.0.0-py3-none-any.whl (33.0 kB view details)

Uploaded Mar 7, 2026 Python 3

File details

Details for the file ai_sub-2.0.0.tar.gz.

File metadata

Download URL: ai_sub-2.0.0.tar.gz
Upload date: Mar 7, 2026
Size: 32.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_sub-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`32a5ed9158f55796d3380a7301ad6fdabdd1576aa24870294e4e403300ae413a`
MD5	`b4179bd92457790f8c44313f848c3405`
BLAKE2b-256	`ab65587d5678e068898ea4811a0608f772d42326815744878e76817f65e88d22`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_sub-2.0.0.tar.gz:

Publisher: publish.yml on FlippFuzz/ai-sub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_sub-2.0.0.tar.gz
- Subject digest: 32a5ed9158f55796d3380a7301ad6fdabdd1576aa24870294e4e403300ae413a
- Sigstore transparency entry: 1053307813
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: FlippFuzz/ai-sub@3cb5d635043487d2f42248166bbd993e639501a9
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/FlippFuzz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3cb5d635043487d2f42248166bbd993e639501a9
- Trigger Event: release

File details

Details for the file ai_sub-2.0.0-py3-none-any.whl.

File metadata

Download URL: ai_sub-2.0.0-py3-none-any.whl
Upload date: Mar 7, 2026
Size: 33.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_sub-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64a7f89dfb291dd8874ec9acd4ef871ab37791fe524234afe3a3f61752a8672b`
MD5	`d4af9bfdfc0f5fc36bbbcd9257f38696`
BLAKE2b-256	`4af59045c0309cf631215e184b7e51d91d7f969101af85e90c9343ca8c6332ec`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_sub-2.0.0-py3-none-any.whl:

Publisher: publish.yml on FlippFuzz/ai-sub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ai_sub-2.0.0-py3-none-any.whl
- Subject digest: 64a7f89dfb291dd8874ec9acd4ef871ab37791fe524234afe3a3f61752a8672b
- Sigstore transparency entry: 1053307828
- Sigstore integration time: Mar 7, 2026
Source repository:
- Permalink: FlippFuzz/ai-sub@3cb5d635043487d2f42248166bbd993e639501a9
- Branch / Tag: refs/tags/v2.0.0
- Owner: https://github.com/FlippFuzz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@3cb5d635043487d2f42248166bbd993e639501a9
- Trigger Event: release

ai-sub 2.0.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

AI Sub: AI-Powered Subtitle Generation with Translation

Overview

Showcase

How It Works

Installation

Usage

Option 1: Using Google AI Studio API Key

Option 2: Using Gemini CLI

Configuration

AI Settings (--ai.*)

Google AI Settings (--ai.google.*)

Gemini CLI Settings (--ai.gemini-cli.*)

Splitting Settings (--split.*)

Re-Encode Settings (--split.re-encode.*)

Directory Settings (--dir.*)

Concurrency Settings (--thread.*)

Retry Settings (--retry.*)

Logging Settings (--log.*)

Known Limitations

Advanced: Re-processing Segments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

AI Settings (`--ai.*`)

Google AI Settings (`--ai.google.*`)

Gemini CLI Settings (`--ai.gemini-cli.*`)

Splitting Settings (`--split.*`)

Re-Encode Settings (`--split.re-encode.*`)

Directory Settings (`--dir.*`)

Concurrency Settings (`--thread.*`)

Retry Settings (`--retry.*`)

Logging Settings (`--log.*`)