Skip to main content

AI-Powered Subtitle Generation with Translation

Project description

AI Sub: AI-Powered Subtitle Generation with Translation

PyPI version Downloads


Overview

AI Sub is a command-line tool that leverages Google's Gemini models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise English and Japanese subtitles by analyzing both audio and visual cues.

Key Features:

  • Multimodal Understanding: Utilizes video frames for context (e.g., identifying speakers, reading on-screen text) and audio for precise timing.
  • Dual-Language Support: Generates verbatim transcriptions and translations for English and Japanese.
  • Automatic Segmentation: Automatically splits long videos into smaller segments for efficient processing.

Showcase

Here's an example of subtitles generated by AI Sub:

Video Screenshot

For more examples, please visit ai-sub-showcase.


How It Works

  1. Preprocessing: The input video is segmented into smaller chunks to fit within API context windows and file size limits.
  2. AI Processing: Each segment is sent to Google Gemini. The AI analyzes the audio for speech and the video for context, following strict prompting rules to generate subtitles.
  3. Compilation: Generated subtitles from all segments are merged into a final, chronologically sorted SRT file.

Installation

Prerequisites: Python 3.10 or higher.

  1. Set up a Python virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
    
  2. Install AI Sub:

    pip install --upgrade ai-sub
    

Usage

You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI.

Option 1: Using Google AI Studio API Key

  1. Obtain your API Key:

    • Sign in to Google AI Studio.
    • Click "Create API Key".
    • Copy and securely store your key. Never disclose your API key publicly.
  2. Run the application:

    ai-sub --ai.google.key YOUR_API_KEY --ai.model=google-gla:gemini-3-flash-preview "path/to/your/video.mp4"
    

    Note: Replace YOUR_API_KEY with your actual key and "path/to/your/video.mp4" with the video file path.

Option 2: Using Gemini CLI

  1. Install and Authenticate Gemini CLI:

    • Install: npm install -g @google/gemini-cli
    • Authenticate: Follow instructions at gemini-cli.
  2. Run the application:

    ai-sub --ai.model=gemini-cli:gemini-3-flash-preview --split.re-encode.enabled=True "path/to/your/video.mp4"
    

    Important Notes for CLI Mode:

    • No API key is required; the tool uses your authenticated Gemini CLI instance.
    • Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk. The default re-encoding settings are aggressive and should work for most inputs.
    • Re-encoding is resource-intensive and will increase processing time.

Configuration

For a detailed list of all configuration options, including AI models, re-encoding settings, and concurrency controls, please refer to CONFIGURATION.md.

All settings can be configured via command-line arguments (e.g., --ai.rpm 10) or environment variables with the AISUB_ prefix (e.g., AISUB_AI_RPM=10).


Known Limitations

  1. Timestamp Accuracy: Subtitle timestamps may occasionally be inaccurate. This is an inherent characteristic of the Gemini AI model. Shorter video segments generally yield better accuracy.
  2. AI Hallucinations: Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.

If you encounter issues, consider re-processing specific video segments as detailed below.


Advanced: Re-processing Segments

Intermediate files are stored in a temporary directory (default: tmp_<input_file_name>). You can customize this location using the --dir.tmp flag.

To re-process a specific segment:

  1. Navigate to the temporary directory (default: tmp_<input_file_name>).
  2. Locate and delete the corresponding part_XXX.json state file for the segment you want to re-process.
  3. Re-run the script. It will automatically detect the missing state file and re-process only that segment from the beginning.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_sub-2.4.0b2.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_sub-2.4.0b2-py3-none-any.whl (39.3 kB view details)

Uploaded Python 3

File details

Details for the file ai_sub-2.4.0b2.tar.gz.

File metadata

  • Download URL: ai_sub-2.4.0b2.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_sub-2.4.0b2.tar.gz
Algorithm Hash digest
SHA256 8310bccdd9f47fb8d77b22cfde31896839db2869f5d8f0d18f44f95ca464e0c3
MD5 375e1e5d9da54d623a427f1e35800cc3
BLAKE2b-256 3aa004df633ae3693c6e8af86e949fbea9bc6da65c32a81bf97c013dc5eef405

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_sub-2.4.0b2.tar.gz:

Publisher: publish.yml on FlippFuzz/ai-sub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ai_sub-2.4.0b2-py3-none-any.whl.

File metadata

  • Download URL: ai_sub-2.4.0b2-py3-none-any.whl
  • Upload date:
  • Size: 39.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ai_sub-2.4.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 6f13afe7179131675b96e1bdadc746b617b9f4e419367e5353b3d0f462742cf1
MD5 f0079007c56533343e2e107f51645cdd
BLAKE2b-256 3452674136b883045ff09e58c12f56222e7b7a6624b440ea0d9a391195f1f87a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ai_sub-2.4.0b2-py3-none-any.whl:

Publisher: publish.yml on FlippFuzz/ai-sub

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page