AI-powered subtitle generator from YouTube URLs or local media files

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

e7217

These details have not been verified by PyPI

Project description

SOGON

An AI-powered automation tool that extracts audio from video URLs or media files and generates subtitles using advanced speech recognition technology.

한국어

Key Features

Flexible Audio Extraction: High-quality audio extraction from video URLs or local media files
AI Speech Recognition: Accurate Korean speech recognition with advanced AI models
Large File Processing: Automatic workaround for 24MB limit (file splitting)
Precise Timestamps: Segment-level time information in HH:mm:ss.SSS format
Intelligent Text Correction: Dual correction system (pattern-based + AI-based)
Systematic Output: Separate storage of original/corrected versions

Installation

Method 1: Install with pipx (Recommended)

# Install globally with pipx
pipx install sogon

# Use the CLI tool
sogon run "https://www.youtube.com/watch?v=VIDEO_ID"

Package Management

# Upgrade to latest version
pipx upgrade sogon

# Check installed version
pipx list

# Uninstall
pipx uninstall sogon

# Reinstall (if needed)
pipx reinstall sogon

Method 2: Development Setup

# Clone and install dependencies
git clone <repository-url>
cd sogon
uv sync

Quick Start

1. API Key Setup

Create a .env file and set your Groq API key:

GROQ_API_KEY=your_groq_api_key_here
OPENAI_API_KEY=your_openai_api_key_here  # Optional: for AI text correction

2. Basic Usage

# Process video URL
sogon run "https://www.youtube.com/watch?v=VIDEO_ID"

# Process local media file
sogon run "/path/to/video/file.mp4"

System Architecture

Video URL/File → Audio Extract → Speech Recognition → Text Correction → File Save
      ↓             ↓                ↓                 ↓              ↓
  Downloader    Audio Tool    AI Speech Model     AI Correction   result/

Processing Steps

Audio Extraction: Extract audio from video URLs or local files using media processing tools
File Processing: Split large files to comply with API limitations
Speech Recognition: Process audio with advanced AI models for Korean text
Text Correction: Apply pattern-based and AI-based corrections
Output Generation: Save original and corrected versions with timestamps

Output File Structure

Organized by Date/Time/Title:

result/
└── yyyyMMDD_HHmmss_video_title/         # Timestamped folder for each video
    ├── video_title.txt                  # Original continuous text
    ├── video_title_metadata.json        # Original metadata
    ├── video_title_timestamps.txt       # Original timestamps
    ├── video_title_corrected.txt        # Corrected text
    ├── video_title_corrected_metadata.json # Corrected metadata
    └── video_title_corrected_timestamps.txt # Corrected timestamps

Timestamp File Format

Subtitle with Timestamps (Corrected)
==================================================

[00:00:00.560 → 00:00:03.520] Hello. Actually, I was going to continue the visual story writing series,
[00:00:03.520 → 00:00:12.839] but there was a problem in the middle,
[00:00:12.839 → 00:00:14.039] I did up to episode 4, filmed episode 5 and need to upload it, but it's not easy.

Tech Stack

Component	Function	Role
Audio Extraction	Media Downloader + Audio Processor	Video URL/File → Audio conversion
Audio Processing	Audio Library	File splitting, format conversion
Speech Recognition	AI Speech Model	Speech → Text + metadata
AI Correction	Large Language Model	Text correction
Environment Management	Configuration Manager	API key management

Output Files

The tool generates organized output files with timestamps and metadata for both original and corrected versions.

Advanced Features

Existing File Correction

The tool provides functionality to correct existing transcript files with AI-based improvements.

CLI Options

Option	Description	Default
`--format`, `-f`	Output subtitle format (txt, srt, vtt, json)	txt
`--output-dir`, `-o`	Custom output directory	./result
`--no-correction`	Disable text correction	False
`--no-ai-correction`	Disable AI-based text correction	False
`--keep-audio`	Keep downloaded audio files	False
`--translate`	Enable subtitle translation	False
`--target-language`, `-t`	Target language for translation	None
`--source-language`, `-s`	Source language for Whisper	auto-detect
`--log-level`	Logging level (DEBUG, INFO, WARNING, ERROR)	INFO

Error Handling

Automatic file splitting for large files (>24MB)
Partial result saving on failures
Automatic cleanup of temporary files

CLI Usage Examples

Basic Usage

# Process video URL
sogon run "https://www.youtube.com/watch?v=VIDEO_ID"

# Process local media file
sogon run "/path/to/video/file.mp4"

Advanced Options

# Specify output format
sogon run "video.mp4" --format srt

# Disable text correction
sogon run "video.mp4" --no-correction

# Set custom output directory
sogon run "video.mp4" --output-dir ./my-results

# Keep downloaded audio files
sogon run "https://youtube.com/watch?v=..." --keep-audio

# Enable translation to Korean
sogon run "video.mp4" --translate --target-language ko

# Set source language for better transcription
sogon run "video.mp4" --source-language en

# Adjust logging level
sogon run "video.mp4" --log-level DEBUG

Translation Features

# List supported languages
sogon list-languages

# Translate to different languages
sogon run "video.mp4" --translate --target-language en  # English
sogon run "video.mp4" --translate --target-language ko  # Korean

Output Formats

# Different subtitle formats
sogon run "video.mp4" --format txt   # Plain text (default)
sogon run "video.mp4" --format srt   # SubRip subtitle format
sogon run "video.mp4" --format vtt   # WebVTT format
sogon run "video.mp4" --format json  # JSON format with metadata

Requirements

System Requirements

Python 3.12+
Audio processing tools
Internet connection (for video URL download and AI API access)

Dependencies

The project requires various Python packages for audio processing, AI integration, and configuration management. See the project configuration file for specific requirements.

Troubleshooting

Audio Tools: Install required audio processing tools via package manager
API Key: Set up valid AI service API key in .env file
Network Issues: Ensure stable internet connection

License

This project is distributed under the MIT License.

Contributing

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Support

If you encounter any issues or have questions, please contact us through GitHub Issues.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

e7217

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Sep 23, 2025

0.1.1

Jul 2, 2025

0.1.0

Jul 2, 2025

0.0.1

Jul 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sogon-0.1.2-py3-none-any.whl (82.3 kB view details)

Uploaded Sep 23, 2025 Python 3

File details

Details for the file sogon-0.1.2-py3-none-any.whl.

File metadata

Download URL: sogon-0.1.2-py3-none-any.whl
Upload date: Sep 23, 2025
Size: 82.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for sogon-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7ab0ffaeb41955849204db373e3e7d1ace3dcbefb83000b436dcc775114cc975`
MD5	`ebdb9a8d36be8007ad1f99bd16ead829`
BLAKE2b-256	`c38cc41e0d2825d793a03a530cf5a1f522fe05e11f03be597ace07f7a3140b4d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sogon-0.1.2-py3-none-any.whl:

Publisher: release.yml on e7217/sogon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sogon-0.1.2-py3-none-any.whl
- Subject digest: 7ab0ffaeb41955849204db373e3e7d1ace3dcbefb83000b436dcc775114cc975
- Sigstore transparency entry: 551486870
- Sigstore integration time: Sep 23, 2025
Source repository:
- Permalink: e7217/sogon@7360dd05e29af1e0462cf7abb456e15473a269c7
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/e7217
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@7360dd05e29af1e0462cf7abb456e15473a269c7
- Trigger Event: release

sogon 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SOGON

Key Features

Installation

Method 1: Install with pipx (Recommended)

Package Management

Method 2: Development Setup

Quick Start

1. API Key Setup

2. Basic Usage

System Architecture

Processing Steps

Output File Structure

Timestamp File Format

Tech Stack

Output Files

Advanced Features

Existing File Correction

CLI Options

Error Handling

CLI Usage Examples

Basic Usage

Advanced Options

Translation Features

Output Formats

Requirements

System Requirements

Dependencies

Troubleshooting

License

Contributing

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes

Provenance