AI-powered subtitle generator from YouTube URLs or local media files
Project description
SOGON
An AI-powered automation tool that extracts audio from video URLs or media files and generates subtitles using advanced speech recognition technology.
Key Features
- Flexible Audio Extraction: High-quality audio extraction from video URLs or local media files
- AI Speech Recognition: Accurate Korean speech recognition with advanced AI models
- Large File Processing: Automatic workaround for 24MB limit (file splitting)
- Precise Timestamps: Segment-level time information in HH:mm:ss.SSS format
- Intelligent Text Correction: Dual correction system (pattern-based + AI-based)
- Systematic Output: Separate storage of original/corrected versions
Installation
Method 1: Install with pipx (Recommended)
# Install globally with pipx
pipx install sogon
# Use the CLI tool
sogon run "https://www.youtube.com/watch?v=VIDEO_ID"
Package Management
# Upgrade to latest version
pipx upgrade sogon
# Check installed version
pipx list
# Uninstall
pipx uninstall sogon
# Reinstall (if needed)
pipx reinstall sogon
Method 2: Development Setup
# Clone and install dependencies
git clone <repository-url>
cd sogon
uv sync
Quick Start
1. API Key Setup
Create a .env file and set your Groq API key:
GROQ_API_KEY=your_groq_api_key_here
OPENAI_API_KEY=your_openai_api_key_here # Optional: for AI text correction
2. Basic Usage
# Process video URL
sogon run "https://www.youtube.com/watch?v=VIDEO_ID"
# Process local media file
sogon run "/path/to/video/file.mp4"
System Architecture
Video URL/File → Audio Extract → Speech Recognition → Text Correction → File Save
↓ ↓ ↓ ↓ ↓
Downloader Audio Tool AI Speech Model AI Correction result/
Processing Steps
- Audio Extraction: Extract audio from video URLs or local files using media processing tools
- File Processing: Split large files to comply with API limitations
- Speech Recognition: Process audio with advanced AI models for Korean text
- Text Correction: Apply pattern-based and AI-based corrections
- Output Generation: Save original and corrected versions with timestamps
Output File Structure
Organized by Date/Time/Title:
result/
└── yyyyMMDD_HHmmss_video_title/ # Timestamped folder for each video
├── video_title.txt # Original continuous text
├── video_title_metadata.json # Original metadata
├── video_title_timestamps.txt # Original timestamps
├── video_title_corrected.txt # Corrected text
├── video_title_corrected_metadata.json # Corrected metadata
└── video_title_corrected_timestamps.txt # Corrected timestamps
Timestamp File Format
Subtitle with Timestamps (Corrected)
==================================================
[00:00:00.560 → 00:00:03.520] Hello. Actually, I was going to continue the visual story writing series,
[00:00:03.520 → 00:00:12.839] but there was a problem in the middle,
[00:00:12.839 → 00:00:14.039] I did up to episode 4, filmed episode 5 and need to upload it, but it's not easy.
Tech Stack
| Component | Function | Role |
|---|---|---|
| Audio Extraction | Media Downloader + Audio Processor | Video URL/File → Audio conversion |
| Audio Processing | Audio Library | File splitting, format conversion |
| Speech Recognition | AI Speech Model | Speech → Text + metadata |
| AI Correction | Large Language Model | Text correction |
| Environment Management | Configuration Manager | API key management |
Output Files
The tool generates organized output files with timestamps and metadata for both original and corrected versions.
Advanced Features
Existing File Correction
The tool provides functionality to correct existing transcript files with AI-based improvements.
CLI Options
| Option | Description | Default |
|---|---|---|
--format, -f |
Output subtitle format (txt, srt, vtt, json) | txt |
--output-dir, -o |
Custom output directory | ./result |
--no-correction |
Disable text correction | False |
--no-ai-correction |
Disable AI-based text correction | False |
--keep-audio |
Keep downloaded audio files | False |
--translate |
Enable subtitle translation | False |
--target-language, -t |
Target language for translation | None |
--source-language, -s |
Source language for Whisper | auto-detect |
--log-level |
Logging level (DEBUG, INFO, WARNING, ERROR) | INFO |
Error Handling
- Automatic file splitting for large files (>24MB)
- Partial result saving on failures
- Automatic cleanup of temporary files
CLI Usage Examples
Basic Usage
# Process video URL
sogon run "https://www.youtube.com/watch?v=VIDEO_ID"
# Process local media file
sogon run "/path/to/video/file.mp4"
Advanced Options
# Specify output format
sogon run "video.mp4" --format srt
# Disable text correction
sogon run "video.mp4" --no-correction
# Set custom output directory
sogon run "video.mp4" --output-dir ./my-results
# Keep downloaded audio files
sogon run "https://youtube.com/watch?v=..." --keep-audio
# Enable translation to Korean
sogon run "video.mp4" --translate --target-language ko
# Set source language for better transcription
sogon run "video.mp4" --source-language en
# Adjust logging level
sogon run "video.mp4" --log-level DEBUG
Translation Features
# List supported languages
sogon list-languages
# Translate to different languages
sogon run "video.mp4" --translate --target-language en # English
sogon run "video.mp4" --translate --target-language ko # Korean
Output Formats
# Different subtitle formats
sogon run "video.mp4" --format txt # Plain text (default)
sogon run "video.mp4" --format srt # SubRip subtitle format
sogon run "video.mp4" --format vtt # WebVTT format
sogon run "video.mp4" --format json # JSON format with metadata
Requirements
System Requirements
- Python 3.12+
- Audio processing tools
- Internet connection (for video URL download and AI API access)
Dependencies
The project requires various Python packages for audio processing, AI integration, and configuration management. See the project configuration file for specific requirements.
Troubleshooting
- Audio Tools: Install required audio processing tools via package manager
- API Key: Set up valid AI service API key in
.envfile - Network Issues: Ensure stable internet connection
License
This project is distributed under the MIT License.
Contributing
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Support
If you encounter any issues or have questions, please contact us through GitHub Issues.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sogon-0.1.2-py3-none-any.whl.
File metadata
- Download URL: sogon-0.1.2-py3-none-any.whl
- Upload date:
- Size: 82.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ab0ffaeb41955849204db373e3e7d1ace3dcbefb83000b436dcc775114cc975
|
|
| MD5 |
ebdb9a8d36be8007ad1f99bd16ead829
|
|
| BLAKE2b-256 |
c38cc41e0d2825d793a03a530cf5a1f522fe05e11f03be597ace07f7a3140b4d
|
Provenance
The following attestation bundles were made for sogon-0.1.2-py3-none-any.whl:
Publisher:
release.yml on e7217/sogon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sogon-0.1.2-py3-none-any.whl -
Subject digest:
7ab0ffaeb41955849204db373e3e7d1ace3dcbefb83000b436dcc775114cc975 - Sigstore transparency entry: 551486870
- Sigstore integration time:
-
Permalink:
e7217/sogon@7360dd05e29af1e0462cf7abb456e15473a269c7 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/e7217
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@7360dd05e29af1e0462cf7abb456e15473a269c7 -
Trigger Event:
release
-
Statement type: