Skip to main content

A set of UNIX-style tools for generating and translating subtitles using AI.

Project description

AI Subtitle Assistant

中文版本 (Chinese Version)

AI Subtitle Assistant is a command-line tool that uses AI technologies (Whisper and Large Language Models) to generate high-quality subtitles for video and audio files.

Features

  • Multi-format Support: Handles various common video and audio file formats.
  • High-precision Speech Recognition: Uses OpenAI's Whisper model for accurate audio transcription.
  • Intelligent Translation and Correction: Leverages Large Language Models (LLMs) for translation, correcting errors based on context, and identifying proper nouns.
  • Flexible LLM Configuration: Allows users to customize the API base URL, key, and model for their LLM provider.
  • Model Selection: Choose from different LLM models for translation tasks.
  • Concurrent Translation: Processes multiple translation requests concurrently for faster performance.
  • Translation Validation: Verifies that the original text returned by the LLM matches the input text to prevent hallucinations.
  • Improved Translation Quality: Adjusted importance weights to better balance accuracy and fluency (1:0.6).
  • Context Limit Handling: Detects and warns about model context limits that may cause truncated outputs.
  • Debug Mode: Enables detailed output of intermediate JSON data for troubleshooting.
  • Standard Subtitle Output: Generates standard UTF-8 encoded SRT subtitle files.
  • Bilingual Subtitles: Can generate bilingual subtitles for language learning.
  • Embedded Subtitle Extraction: Can detect and extract existing subtitle tracks from video files.
  • Internationalization: Supports multiple languages for the user interface (currently English and Chinese).
  • User-friendly CLI: Provides a clear and easy-to-use command-line interface with subcommands.

Installation

Recommended: Install from PyPI

pip install ai-subtitle

Or install from source:

git clone https://github.com/shdancer/ai-subtitle
cd subtitle
python3 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
pip install .

ffmpeg is required for audio processing.

  • On macOS (using Homebrew):
    brew install ffmpeg
    
  • On Debian/Ubuntu:
    sudo apt update && sudo apt install ffmpeg
    
  • On Windows (using Chocolatey):
    choco install ffmpeg
    

Usage

After installation, you can use the ai-subtitle command.

ai-subtitle <command> [options]

Global Options

  • --language {en,zh}: Sets the display language for the tool. Defaults to your system's language.

Commands

transcribe

Transcribes an audio/video file to an SRT file.

Usage: ai-subtitle transcribe <input_file> [options]

Arguments:

  • input_file: Path to the input video or audio file.

Options:

  • -o, --output: Path to the output SRT file. If not specified, prints to standard output.
  • -m, --model: The Whisper model to use (e.g., tiny, base, small, medium, large). Default is base.
  • --force-transcribe: Force transcription even if embedded subtitles are found.

Example:

# Transcribe a video and save to a file
ai-subtitle transcribe my_video.mp4 -o my_video.srt

# Extract an embedded subtitle instead of transcribing
ai-subtitle transcribe my_movie.mkv -o movie_subs.srt

translate

Translates an existing SRT file into a bilingual SRT file.

Usage: ai-subtitle translate [input_file] [options]

Arguments:

  • input_file: Path to the input SRT file. Reads from standard input if not provided.

Options:

  • -o, --output: Path to the output bilingual SRT file. Prints to standard output if not specified.
  • -t, --target-language: The target language for translation (e.g., "Chinese", "English"). Default is "Chinese".
  • --model: Select the model to use for translation (e.g., "gpt-3.5-turbo", "gpt-4"). Default is "gpt-3.5-turbo".
  • --max-workers: Maximum number of concurrent translation requests. Default is 5.
  • --list-models: List available models from the API and exit.
  • --api-base-url: Custom base URL for the LLM provider.
  • --api-key: Custom API key for the LLM provider.

Example:

# Transcribe and then translate
ai-subtitle transcribe my_video.mp4 | ai-subtitle translate -t "Japanese" -o bilingual.srt

config

Manages configuration settings for the AI Subtitle Assistant.

Usage: ai-subtitle config [options]

Options:

  • --show-path: Show the configuration file path.
  • --create: Create or update configuration interactively.

Example:

# Show the configuration file path
ai-subtitle config --show-path

# Create or update configuration
ai-subtitle config --create

How It Works

  1. Audio Extraction/Transcription: For the transcribe command, it either extracts existing subtitles or uses ffmpeg to extract audio and whisper to transcribe it into timed text segments.
  2. Chunking & Translation: For the translate command, it reads an SRT file, chunks the text to fit the LLM's context window, and sends it for translation.
  3. LLM Processing: The text is sent to the configured LLM for translation and refinement. The process includes retries and a progress bar.
  4. SRT Generation: The final processed text is formatted into a standard .srt file, either as a simple transcription or a bilingual subtitle.

Changelog

v0.1.4

  • Added: Model selection feature for translation with --model option
  • Added: List available models with --list-models option
  • Added: Debug mode for troubleshooting translation issues (enabled via AI_SUBTITLE_DEBUG=1 environment variable)
  • Improved: Translation prompt to better handle different language structures and prevent subtitle misalignment
  • Added: Graceful exit handling for keyboard interrupts (Ctrl+C)
  • Added: Validation to check for missing translations
  • Added: Concurrent translation processing for improved performance
  • Added: Translation validation to verify original text matches input text
  • Improved: Translation quality by adjusting importance weights to better balance accuracy and fluency (1:0.6)
  • Added: --max-workers option to control the number of concurrent translation requests
  • Added: Context limit handling to detect and warn about truncated outputs

v0.1.2

  • Fixed: SRT multi-line content parsing bug, now all lines are preserved and correctly translated
  • Improved: Documentation and README structure
  • Updated: PyPI installation instructions and project metadata

v0.1.1

  • Fixed issue with pipeline operations where debug output was interfering with standard output
  • Improved error handling and messaging
  • Updated documentation with installation instructions from PyPI

v0.1.0

  • Initial release with core functionality

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_subtitle-0.1.4.tar.gz (24.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_subtitle-0.1.4-py3-none-any.whl (21.0 kB view details)

Uploaded Python 3

File details

Details for the file ai_subtitle-0.1.4.tar.gz.

File metadata

  • Download URL: ai_subtitle-0.1.4.tar.gz
  • Upload date:
  • Size: 24.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ai_subtitle-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d5d4b74785be2e6312278eb74b8c60d58debd692ca4f2514a3992bf0c408aa7f
MD5 ea476a07c3a48925fb1d89d70dcd810b
BLAKE2b-256 10d924e73726c3ed12428d7800d9b293047293e131cae691d9da2b7cf269aaa7

See more details on using hashes here.

File details

Details for the file ai_subtitle-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: ai_subtitle-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 21.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for ai_subtitle-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 377675e1663b4fd74f203cf6700f2eb05d6ae25ae99cebafcd503e035494c15a
MD5 2cf0fdffaac2be3155ba36cc77f54d02
BLAKE2b-256 8dc7cea6aea6709c0a1546e6ea877be5e801af0fa4ae4d74d3cf997d02c6131c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page