Command line tool to transcribe & translate audio from livestreams in real time
Project description
stream-translator-gpt
Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and Whisper / Faster-Whisper for transcription.
This fork optimized the audio slicing logic based on VAD, introduced GPT API / Gemini API to support language translation beyond English, and supports input from the audio devices.
Prerequisites
Linux or Windows:
- Python >= 3.8 (Recommend >= 3.10)
- Install CUDA 11 on your system. (Faster-Whisper is not compatible with CUDA 12 for now).
- Install cuDNN to your CUDA dir if you want to use Faseter-Whisper.
- Install PyTorch (with CUDA) to your Python.
- Create a Google API key if you want to use Gemini API for translation. (Recommend, Free 60 requests / minute)
- Create a OpenAI API key if you want to use Whisper API for transcription or GPT API for translation.
If you are in Windows, you also need to:
- Install and add ffmpeg to your PATH.
- Install yt-dlp and add it to your PATH.
Installation
Install release version from PyPI (Recommend):
pip install stream-translator-gpt
stream-translator-gpt
or
Clone master version code from Github:
git clone https://github.com/ionic-bond/stream-translator-gpt.git
pip install -r ./stream-translator-gpt/requirements.txt
python3 ./stream-translator-gpt/translator.py
Usage
-
Transcribe live streaming (default use Whisper):
stream-translator-gpt {URL} --model large --language {input_language}
-
Transcribe by Faster Whisper:
stream-translator-gpt {URL} --model large --language {input_language} --use_faster_whisper
-
Transcribe by Whisper API:
stream-translator-gpt {URL} --language {input_language} --use_whisper_api --openai_api_key {your_openai_key}
-
Translate to other language by Gemini:
stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
-
Translate to other language by GPT:
stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --openai_api_key {your_openai_key}
-
Using Whisper API and Gemini at the same time:
stream-translator-gpt {URL} --model large --language ja --use_whisper_api --openai_api_key {your_openai_key} --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
-
Local video/audio file as input:
stream-translator-gpt /path/to/file --model large --language {input_language}
-
Computer microphone as input:
stream-translator-gpt device --model large --language {input_language}
Will use the system's default audio device as input.
If you want to use another audio input device,
stream-translator-gpt device --print_all_devices
get device index and then run the CLI with--device_index {index}
.If you want to use the audio output of another program as input, you need to enable stereo mix.
-
Sending result to Cqhttp:
stream-translator-gpt {URL} --model large --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}
-
Sending result to Discord:
stream-translator-gpt {URL} --model large --language {input_language} --discord_webhook_url {your_discord_webhook_url}
-
Saving result to a .srt subtitle file:
stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key} --hide_transcribe_result --output_timestamps --output_file_path ./result.srt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stream_translator_gpt-2024.4.24.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a59c4300e6c0761d030d16346123a0f12c9e05df675779b884abdb404c81131a |
|
MD5 | e153405949aec4bb4daca3139ac11446 |
|
BLAKE2b-256 | 50f399307737ccdb0ace825a64c39cfc5f793792df32bf9da9c75e3bf7d4b1b7 |
Hashes for stream_translator_gpt-2024.4.24-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36720d90a1c36b0bfcf0ae466c548897649a5d144612336b5b7f6b53e9ed3c7d |
|
MD5 | f67285974364f6868dc296a666d16972 |
|
BLAKE2b-256 | 83d362a6d534b4724e38cddd4421cf178ddc545caeda19027a83e1b4f0918cd1 |