🎙️ SonicScribe - Transcribe & Translate audio/video files using Whisper and GPT models.
Project description
🎙️ SonicScribe
A powerful CLI tool and Python module for transcribing and translating audio/video files using OpenAI's Whisper and GPT models.
Features
-
🎬 Extract audio from various video and audio formats
-
🔤 Transcribe audio using OpenAI's Whisper API
-
🌐 Translate transcriptions to English or other languages
-
📝 Generate transcript and subtitle files (SRT)
-
🗣️ Support for bilingual subtitles
-
📊 Smart handling of large files by automatic chunking
-
⚡ Interactive language selection with auto-detection support
-
🛠️ Modular design for use as a Python library
-
📈 Progress indicators and detailed logging
Installation
Prerequisites
-
Python 3.8 or higher
-
OpenAI API key
Setup
-
Clone the repository:
git clone https://github.com/tuklu/SonicScribe.git cd SonicScribe/SonicScribe
-
Create a virtual environment and activate it:
python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate
-
Install the required packages:
pip install -r requirements.txt
-
Create a
.envfile in the root directory with your OpenAI API key:OPENAI_API_KEY=your_api_key_here
Usage
Main Transcription and Translation
The main script provides full functionality for processing audio/video files:
python main.py --input "path/to/your/video.mp4" --translate --output-dir "output/folder"
Options
-
--input: Path to input audio/video file (required) -
--translate: Enable translation to English (optional) -
--language: Specify the language of the input file (e.g.,en,fr,es). If not provided, auto-detection will be used. -
--output-dir: Directory to save output files (default:output/transcripts) -
--whisper-model: Model to use for transcription (default:whisper-1) -
--gpt-model: Model to use for translation (default:gpt-4o-mini) -
--chunk-size: Size of chunks in MB for large files (default: 20) -
--verboseor-v: Enable verbose logging -
--bilingual: Create bilingual subtitles with both original and translated text
Interactive Language Selection
When processing files, SonicScribe provides an interactive language selection feature:
-
You can select a language from a predefined list using arrow keys.
-
You can manually input a language code by typing
/. -
If no language is selected, SonicScribe will auto-detect the language using GPT, with a warning about potential additional API costs.
Standalone SRT Translation
If you already have an SRT file and just want to translate it:
python translate_srt.py --input "path/to/your/subtitles.srt" --bilingual
Options
-
--input: Path to input SRT file (required) -
--output: Path to output SRT file (default:input_english.srt) -
--model: GPT model to use for translation (default:gpt-4o-mini) -
--bilingual: Create bilingual SRT with original and translated text -
--language: Specify the language of the input subtitles. If not provided, auto-detection will be used.
Examples
Basic Transcription
python main.py --input "lecture.mp4"
This will:
-
Extract audio from
lecture.mp4 -
Transcribe the audio using Whisper API
-
Save a transcript and SRT file to the default output directory
Transcription with Translation
python main.py --input "foreign_movie.mp4" --translate --gpt-model "gpt-4o"
This will:
-
Extract audio from
foreign_movie.mp4 -
Transcribe the audio using Whisper API
-
Translate the transcription to English using GPT-4o
-
Save all output files to the default directory
Bilingual Subtitles
python main.py --input "interview.mp3" --translate --bilingual
This will:
-
Extract audio from
interview.mp3 -
Transcribe the audio using Whisper API
-
Translate the transcription to English
-
Create a bilingual SRT file with both original and translated text
Translating Existing Subtitles
python translate_srt.py --input "movie.srt" --bilingual --model "gpt-4o-mini"
This will:
-
Read the existing SRT file
-
Translate the subtitles to English using GPT-4o-mini
-
Create a bilingual SRT with both original and translated text
Output Files
SonicScribe generates several types of output files:
-
{filename}_transcribed.txt: Plain text transcript -
{filename}.srt: SRT subtitle file with timestamps -
{filename}_bilingual.srt: Optional bilingual SRT file (when using--bilingual) -
{filename}_english.srt: Translated SRT file (when usingtranslate_srt.py)
Handling Large Files
SonicScribe automatically handles large audio files:
-
Files smaller than 25MB are processed directly through the Whisper API.
-
Larger files are split into chunks, processed separately, and then recombined.
-
The
--chunk-sizeparameter controls the size of these chunks (default: 20MB).
Troubleshooting
API Key Issues
If you encounter "API key not found" errors:
-
Ensure your
.envfile exists in the project root directory. -
Verify that your API key is correct and active.
-
Try setting the API key directly in your environment.
File Format Problems
If SonicScribe fails to process your file:
-
Verify the file exists and is not corrupted.
-
Check that the format is supported (mp4, mkv, mov, mp3, wav, etc.).
-
Try converting the file to a more standard format like MP4 or WAV.
Memory Issues with Large Files
If you encounter memory errors with very large files:
-
Try reducing the
--chunk-sizeparameter. -
Ensure your system has sufficient free memory.
-
Consider pre-splitting very large files manually.
Limitations
-
OpenAI API rate limits may affect processing speed.
-
Transcription quality depends on audio clarity.
-
Translation quality varies by language and content complexity.
-
Processing very large files (multiple hours) can take significant time.
Logging
SonicScribe logs all operations to the logs directory. If you encounter issues, check the log files for detailed information. Use the --verbose flag for more detailed logging.
License
[Your license information here]
Support
For issues, questions, or contributions, please create an issue on the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sonicscribe-1.0.0.tar.gz.
File metadata
- Download URL: sonicscribe-1.0.0.tar.gz
- Upload date:
- Size: 16.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e78b3c3c44e0ffada2fc8443e89a2153bf72fb25e8c630525758a0b9bcb92d2
|
|
| MD5 |
1ffca9ad4221a221fc08039dd03461b6
|
|
| BLAKE2b-256 |
c6535348a4528b0a574e7103c4a8303e2d6e74174b87d3f5418a0e48bba96e29
|
File details
Details for the file sonicscribe-1.0.0-py3-none-any.whl.
File metadata
- Download URL: sonicscribe-1.0.0-py3-none-any.whl
- Upload date:
- Size: 17.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
882879133b469914cc44e3326e14640b140f2b0ed6d4f14fc6b7cfd1aa68c972
|
|
| MD5 |
ef08f0b498bdf9b81b8dbd98ad3a183e
|
|
| BLAKE2b-256 |
7e834c58fff5ab67f9fe324f315fd16fc3afb48bb9c9c19274b253c85989fa55
|