Command line tool to transcribe & translate audio from livestreams in real time
Project description
stream-translator-gpt
Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and OpenAI's Whisper for transcription/translation.
This fork optimized the audio slicing logic based on VAD, introduced OpenAI's GPT API / Google's Gemini API to support language translation beyond English, and supports getting audio from the devices.
Prerequisites
- Install and add ffmpeg to your PATH
- Install CUDA on your system. You can check the installed CUDA version with
nvcc --version
.
Setup
- Setup a virtual environment.
pip install stream-translator-gpt
- Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.
Usage
-
Translate live streaming:
stream-translator-gpt {URL} {flags...}
By default, the URL can be of the form
twitch.tv/forsen
and yt-dlp is used to obtain the .m3u8 link which is passed to ffmpeg. -
Translate PC device audio:
stream-translator-gpt device {flags...}
Will use the system's default audio device as input.
If need to use another audio input device,
stream-translator-gpt device --print_all_devices
get device index and run the CLI with--device_index
.
Flags
--flags | Default Value | Description |
---|---|---|
URL |
The URL of the stream. If fill in "device", the audio will be obtained from your PC device. | |
--format |
wa* | Stream format code, this parameter will be passed directly to yt-dlp. |
--cookies |
Used to open member-only stream, this parameter will be passed directly to yt-dlp. | |
--device_index |
The index of the device that needs to be recorded. If not set, the system default recording device will be used. | |
--frame_duration |
0.1 | The unit that processes live streaming data in seconds. |
--continuous_no_speech_threshold |
0.8 | Slice if there is no speech for a continuous period in second. |
--min_audio_length |
3.0 | Minimum slice audio length in seconds. |
--max_audio_length |
30.0 | Maximum slice audio length in seconds. |
--prefix_retention_length |
0.8 | The length of the retention prefix audio during slicing. |
--vad_threshold |
0.5 | The threshold of Voice activity detection. if the speech probability of a frame is higher than this value, then this frame is speech. |
--model |
small | Select model size. See here for available models. |
--task |
translate | Whether to transcribe the audio (keep original language) or translate to english. |
--language |
auto | Language spoken in the stream. See here for available languages. |
--beam_size |
5 | Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate). |
--best_of |
5 | Number of candidates when sampling with non-zero temperature. |
--direct_url |
Set this flag to pass the URL directly to ffmpeg. Otherwise, yt-dlp is used to obtain the stream URL. | |
--use_faster_whisper |
Set this flag to use faster_whisper implementation instead of the original OpenAI implementation | |
--use_whisper_api |
Set this flag to use OpenAI Whisper API instead of the original local Whipser. | |
--whisper_filters |
emoji_filter | Filters apply to whisper results, separated by ",". |
--hide_whisper_result |
Hide the result of Whisper transcribe. | |
--openai_api_key |
OpenAI API key if using GPT translation / Whisper API. | |
--google_api_key |
Google API key if using Gemini translation. | |
--gpt_model |
gpt-3.5-turbo | GPT model name, gpt-3.5-turbo or gpt-4. (If using Gemini, not need to change this) |
--gpt_translation_prompt |
If set, will translate the result text to target language via GPT / Gemini API (According to which API key is filled in). Example: "Translate from Japanese to Chinese" | |
--gpt_translation_history_size |
0 | The number of previous messages sent when calling the GPT / Gemini API. If the history size is 0, the translation will be run parallelly. If the history size > 0, the translation will be run serially. |
--gpt_translation_timeout |
15 | If the GPT / Gemini translation exceeds this number of seconds, the translation will be discarded. |
--gpt_base_url |
Customize the API endpoint of GPT. | |
--retry_if_translation_fails |
Retry when translation times out/fails. Used to generate subtitles offline. | |
--output_timestamps |
Output the timestamp of the text when outputting the text. | |
--cqhttp_url |
If set, will send the result text to the cqhttp server. | |
--cqhttp_token |
Token of cqhttp, if it is not set on the server side, it does not need to fill in. |
Using faster-whisper
faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory).
To use it, install the cuDNN to your CUDA dir, Then you can run the CLI with --use_faster_whisper
.
Contact me
Telegram: @ionic_bond
Donate
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stream-translator-gpt-2024.3.3.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c20a2fa80d5dc4050bcb9fedf32eb03856dd37dd501df4f3f5754e445a6fc3c |
|
MD5 | d9f8e6909cd89260a1e8ecea5b0dfdb0 |
|
BLAKE2b-256 | 3d44f4b46a6fc8df7ecd83a9435de27044dd6cdeaf826c8e07f3cc3c7cabb0f8 |
Hashes for stream_translator_gpt-2024.3.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e65313c08919bafdc2c89831484dc3b8e6a1a0f5862b4ab7b7640902db582da |
|
MD5 | f0510f360539879be8359ba4d974fee6 |
|
BLAKE2b-256 | f99ff2840c604bec2ad5ddb0ff45f4956c8126aed49b0b4fddc6c0cf01539ad3 |