Skip to main content

AI-Powered Subtitle Generation with Translation

Project description

AI Sub: AI-Powered Subtitle Generation with Translation

PyPI version Downloads


Table of Contents

Overview

AI Sub is a command-line tool that leverages Google's Gemini models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise subtitles in the video's original language along with English translations by analyzing both audio and visual cues.


Showcase

Please visit ai-sub-showcase.

Video Screenshot


How It Works

  1. Segmentation: Splits the input video into manageable 5-minute segments.
  2. Re-encoding (Optional): Compresses segments to lower quality (e.g., 1fps, 360p) to reduce bandwidth and upload times.
  3. Upload (Optional): Uploads segments to the Gemini Files API for cloud-based processing. The re-encoding step helps to ensure the segments are below the API's file size limit.
  4. Lyrics Search and Scene Detection (Optional): Detects scenes and performs a web search for official lyrics to improve transcription accuracy for songs.
  5. Generation: Generates precise, synchronized subtitles and translations using the AI model.
  6. Assembly: Stitches the generated subtitles back together into a final SRT file.

Installation

Prerequisites: Python 3.10 or higher.

  1. Set up a Python virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
    
  2. Install AI Sub:

    pip install --upgrade ai-sub
    

Usage

You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI, or a mix of both.

Option 1: Google API Key for Lyrics Search + Gemini CLI for Subtitle Generation (Best for free tier users)

This option provides the best quality for the free tier but is slightly more involved.

  • As of 18 Mar 2026:
  • Gemini API provides 500 free requests/day for gemini-3.1-flash-lite-preview
  • Gemini CLI provides 50 free requests/day for gemini-3-pro-preview and 1000 free requests/day for gemini-3-flash-preview
  1. Obtain your API Key:

    • Sign in to Google AI Studio.
    • Click "Create API Key".
    • Copy and securely store your key. Never disclose your API key publicly.
  2. Install and Authenticate Gemini CLI:

    • Install: npm install -g @google/gemini-cli
    • Note: This requires Node.js and npm to be installed.
    • Authenticate: Follow the instructions at gemini-cli.
  3. Run the application:

    Linux:

    KEY="YOUR_API_KEY"
    FILE_NAME="path/to/your/video.mp4"
    
    ai-sub \
    --ai.model-lyrics=google-gla:gemini-3.1-flash-lite-preview \
    --ai.google.key="${KEY}" \
    --ai.model-subtitles=gemini-cli:gemini-3-pro-preview \
    --split.re-encode.enabled=True \
    "${FILE_NAME}"
    

    Windows:

    SET "KEY=YOUR_API_KEY"
    SET "FILE_NAME=path/to/your/video.mp4"
    
    ai-sub ^
    --ai.model-lyrics=google-gla:gemini-3.1-flash-lite-preview ^
    --ai.google.key="%KEY%" ^
    --ai.model-subtitles=gemini-cli:gemini-3-pro-preview ^
    --split.re-encode.enabled=True ^
    "%FILE_NAME%"
    

Option 2: Using Gemini CLI Only

I recommend against using Gemini CLI for web searches, especially on the free tier. You'll likely encounter "429 RESOURCE_EXHAUSTED" errors due to hidden quota limits on web searches.

  1. Install and Authenticate Gemini CLI:

    • Install: npm install -g @google/gemini-cli
    • Note: This requires Node.js and npm to be installed.
    • Authenticate: Follow instructions at gemini-cli.
  2. Run the application:

    Linux:

    FILE_NAME="path/to/your/video.mp4"
    
    ai-sub \
      --ai.model-subtitles=gemini-cli:gemini-3-pro-preview \
      --split.re-encode.enabled=True \
      --thread.lyrics=0 \
      "${FILE_NAME}"
    

    Windows:

    SET "FILE_NAME=path/to/your/video.mp4"
    
    ai-sub ^
      --ai.model-subtitles=gemini-cli:gemini-3-pro-preview ^
      --split.re-encode.enabled=True ^
      --thread.lyrics=0 ^
      "%FILE_NAME%"
    

    Important Notes for CLI Mode:

    • No API key is required; the tool uses your authenticated Gemini CLI instance.
    • Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk. The default re-encoding settings are aggressive and should work for most inputs.
    • Re-encoding is resource-intensive and will increase processing time.

Option 3: Using Google API Key Only

This is the easiest option if you don't want to set up Gemini CLI, but the quality is lower because the only free model is weak.

  • As of 18 Mar 2026:
  • Gemini API provides 500 free request/day for gemini-3.1-flash-lite-preview
  • Note that "flash-lite" is the weakest model. Higher models are not free and you need to setup billing.
  1. Obtain your API Key:

    • Sign in to Google AI Studio.
    • Click "Create API Key".
    • Copy and securely store your key. Never disclose your API key publicly.
  2. Run the application:

    Linux:

    KEY="YOUR_API_KEY"
    FILE_NAME="path/to/your/video.mp4"
    
    ai-sub \
      --ai.model=google-gla:gemini-3.1-flash-lite-preview \
      --ai.google.key="${KEY}" \
      "${FILE_NAME}"
    

    Windows:

    SET "KEY=YOUR_API_KEY"
    SET "FILE_NAME=path/to/your/video.mp4"
    
    ai-sub ^
      --ai.model=google-gla:gemini-3.1-flash-lite-preview ^
      --ai.google.key="%KEY%" ^
      "%FILE_NAME%"
    

    Note: Replace YOUR_API_KEY with your actual key and "path/to/your/video.mp4" with the video file path.


Configuration

For a detailed list of all configuration options, including AI models, re-encoding settings, and concurrency controls, please refer to CONFIGURATION.md.

All settings can be configured via command-line arguments (e.g., --ai.model=google-gla:gemini-3.0-flash-preview) or environment variables with the AISUB_ prefix (e.g., AISUB_AI_MODEL=gemini-cli:gemini-3.0-flash-preview).


Release Notes

For the latest updates and bug fixes, please refer to RELEASE_NOTES.md.


Known Limitations

  1. Timestamp Accuracy: Subtitle timestamps may occasionally be inaccurate due to limitations of the Gemini AI model. Shorter video segments generally yield better accuracy. Experiment with the --split.max_minutes setting.
  2. AI Hallucinations: Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.

If you encounter issues, consider re-processing specific video segments as detailed below.


Advanced: Re-processing Segments

Intermediate files and job states are stored in a temporary directory (default: tmp_<input_file_name>). You can customize this location using the --dir.tmp flag.

The application creates separate state files for each processing stage (e.g., lyrics detection, subtitle generation). To re-process a specific segment, you must delete the state file for the stage you want to re-run.

File naming format: part_XXX.<stage>.<model_name>.json

Example: To re-run subtitle generation for the third segment:

  1. Navigate to the temporary directory.
  2. Identify the model name used for subtitles (e.g., gemini-3-pro-preview).
  3. Delete the corresponding state file, for example: part_002.subtitles.gemini-3-pro-preview.json.
  4. Re-run the script. It will detect the missing subtitle job state and re-process only that segment, using any existing lyrics data.

To re-run the entire pipeline for that segment (including lyrics search), delete both the lyrics and subtitles JSON files for that part.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_sub-2.4.0b5.tar.gz (39.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_sub-2.4.0b5-py3-none-any.whl (41.1 kB view details)

Uploaded Python 3

File details

Details for the file ai_sub-2.4.0b5.tar.gz.

File metadata

  • Download URL: ai_sub-2.4.0b5.tar.gz
  • Upload date:
  • Size: 39.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_sub-2.4.0b5.tar.gz
Algorithm Hash digest
SHA256 14f496c9585d253c7f88f06180548895fbfac45b9ede22fd55ed0338b01e88cb
MD5 279aba527090e59ce2986c797d72d68b
BLAKE2b-256 078b9e687ae46e55a68117cd3c34d5a7a01f2fc52a6e1d9d8c0141b0b82f93a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_sub-2.4.0b5.tar.gz:

Publisher: publish.yml on FlippFuzz/ai-sub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_sub-2.4.0b5-py3-none-any.whl.

File metadata

  • Download URL: ai_sub-2.4.0b5-py3-none-any.whl
  • Upload date:
  • Size: 41.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_sub-2.4.0b5-py3-none-any.whl
Algorithm Hash digest
SHA256 50856d019e621e4626af37f819d0dd7dab5f0b23737ce0709427c996254ea0db
MD5 209bd29fe64bd6d0f1f64137bf1a4e8b
BLAKE2b-256 9094b305d6ccdcaa655ced45c461e92932eabe73f2a486c9112d3ec05efd8abb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_sub-2.4.0b5-py3-none-any.whl:

Publisher: publish.yml on FlippFuzz/ai-sub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page