AI-powered subtitle generator from YouTube URLs or local media files
Project description
SOGON
An AI-powered automation tool that extracts audio from video URLs or media files and generates subtitles using advanced speech recognition technology.
Key Features
- Flexible Audio Extraction: High-quality audio extraction from video URLs or local media files
- AI Speech Recognition: Accurate Korean speech recognition with advanced AI models
- Large File Processing: Automatic workaround for 24MB limit (file splitting)
- Precise Timestamps: Segment-level time information in HH:mm:ss.SSS format
- Intelligent Text Correction: Dual correction system (pattern-based + AI-based)
- Systematic Output: Separate storage of original/corrected versions
Quick Start
1. Environment Setup
# Install dependencies
uv sync
2. API Key Setup
Create a .env file and set your Groq API key:
GROQ_API_KEY=your_groq_api_key_here
3. Run
# Process video URL
python main.py "https://www.youtube.com/watch?v=VIDEO_ID"
# Process local media file
python main.py "/path/to/video/file.mp4"
System Architecture
Video URL/File → Audio Extract → Speech Recognition → Text Correction → File Save
↓ ↓ ↓ ↓ ↓
Downloader Audio Tool AI Speech Model AI Correction result/
Processing Steps
- Audio Extraction: Extract audio from video URLs or local files using media processing tools
- File Processing: Split large files to comply with API limitations
- Speech Recognition: Process audio with advanced AI models for Korean text
- Text Correction: Apply pattern-based and AI-based corrections
- Output Generation: Save original and corrected versions with timestamps
Output File Structure
Organized by Date/Time/Title:
result/
└── yyyyMMDD_HHmmss_video_title/ # Timestamped folder for each video
├── video_title.txt # Original continuous text
├── video_title_metadata.json # Original metadata
├── video_title_timestamps.txt # Original timestamps
├── video_title_corrected.txt # Corrected text
├── video_title_corrected_metadata.json # Corrected metadata
└── video_title_corrected_timestamps.txt # Corrected timestamps
Timestamp File Format
Subtitle with Timestamps (Corrected)
==================================================
[00:00:00.560 → 00:00:03.520] Hello. Actually, I was going to continue the visual story writing series,
[00:00:03.520 → 00:00:12.839] but there was a problem in the middle,
[00:00:12.839 → 00:00:14.039] I did up to episode 4, filmed episode 5 and need to upload it, but it's not easy.
Tech Stack
| Component | Function | Role |
|---|---|---|
| Audio Extraction | Media Downloader + Audio Processor | Video URL/File → Audio conversion |
| Audio Processing | Audio Library | File splitting, format conversion |
| Speech Recognition | AI Speech Model | Speech → Text + metadata |
| AI Correction | Large Language Model | Text correction |
| Environment Management | Configuration Manager | API key management |
Output Files
The tool generates organized output files with timestamps and metadata for both original and corrected versions.
Advanced Features
Existing File Correction
The tool provides functionality to correct existing transcript files with AI-based improvements.
Configuration Options
Various options are available to control correction features, output formats, and processing behavior.
Error Handling
- Automatic file splitting for large files (>24MB)
- Partial result saving on failures
- Automatic cleanup of temporary files
Usage Examples
Basic Usage
# Process video URL
python main.py "https://www.youtube.com/watch?v=VIDEO_ID"
# Process local media file
python main.py "/path/to/video.mp4"
Requirements
System Requirements
- Python 3.12+
- Audio processing tools
- Internet connection (for video URL download and AI API access)
Dependencies
The project requires various Python packages for audio processing, AI integration, and configuration management. See the project configuration file for specific requirements.
Troubleshooting
- Audio Tools: Install required audio processing tools via package manager
- API Key: Set up valid AI service API key in
.envfile - Network Issues: Ensure stable internet connection
License
This project is distributed under the MIT License.
Contributing
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Support
If you encounter any issues or have questions, please contact us through GitHub Issues.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sogon-0.0.1-py3-none-any.whl.
File metadata
- Download URL: sogon-0.0.1-py3-none-any.whl
- Upload date:
- Size: 74.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5786bc17928b1dd16d058e68201f2684a31c6d0bfd1b8ee1d1160d6fe6727a9c
|
|
| MD5 |
0ced991cf373ac60551784e1fd8430d4
|
|
| BLAKE2b-256 |
2811605cba56ed5b4c953c3bbc7cd20ca0c5758294e7b6ba2c10a583aeb4eab8
|