Command line tool to transcribe & translate audio from livestreams in real time
Project description
stream-translator-gpt
English | 简体中文
flowchart LR
subgraph ga["`**Input**`"]
direction LR
aa("`**FFmpeg**`")
ab("`**Device audio**`")
ac("`**yt-dlp**`")
ad("`**Local video file**`")
ae("`**Live streaming**`")
ac --> aa
ad --> aa
ae --> ac
end
subgraph gb["`**Audio Slicing**`"]
direction LR
ba("`**VAD**`")
end
subgraph gc["`**Transcription**`"]
direction LR
ca("`**Whisper**`")
cb("`**Faster-Whisper**`")
cc("`**Whisper API**`")
end
subgraph gd["`**Translation**`"]
direction LR
da("`**GPT API**`")
db("`**Gemini API**`")
end
subgraph ge["`**Output**`"]
direction LR
ea("`**Print to stdout**`")
eb("`**Cqhttp**`")
end
aa --> gb
ab --> gb
gb ==> gc
gc ==> gd
gd ==> ge
Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and Whisper / Faster-Whisper for transcription.
This fork optimized the audio slicing logic based on VAD, introduced GPT API / Gemini API to support language translation beyond English, and supports input from the audio devices.
Prerequisites
Linux or Windows:
- Python >= 3.8 (Recommend >= 3.10)
- Install CUDA on your system. You can check the installed CUDA version with
nvcc --version
. - Install cuDNN to your CUDA dir if you want to use Faseter-Whisper.
- Install PyTorch (with CUDA) to your Python.
- Create a Google API key if you want to use Gemini API for translation. (Recommend, Free 60 requests / minute)
- Create a OpenAI API key if you want to use Whisper API for transcription or GPT API for translation.
If you are in Windows, you also need to:
- Install and add ffmpeg to your PATH.
- Install yt-dlp and add it to your PATH.
Installation
Install release version from PyPI (Recommend):
pip install stream-translator-gpt
stream-translator-gpt
or
Clone master version code from Github:
git clone https://github.com/ionic-bond/stream-translator-gpt.git
pip install -r ./stream-translator-gpt/requirements.txt
python3 ./stream-translator-gpt/translator.py
Usage
-
Transcribe live streaming (default use Whisper):
stream-translator-gpt {URL} --model large --language {input_language}
-
Transcribe by Faster Whisper:
stream-translator-gpt {URL} --model large --language {input_language} --use_faster_whisper
-
Transcribe by Whisper API:
stream-translator-gpt {URL} --language {input_language} --use_whisper_api --openai_api_key {your_openai_key}
-
Translate to other language by Gemini:
stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
-
Translate to other language by GPT:
stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --openai_api_key {your_openai_key}
-
Using Whisper API and Gemini at the same time:
stream-translator-gpt {URL} --model large --language ja --use_whisper_api --openai_api_key {your_openai_key} --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
-
Local video/audio file as input:
stream-translator-gpt /path/to/file --model large --language {input_language}
-
Computer microphone as input:
stream-translator-gpt device --model large --language {input_language}
Will use the system's default audio device as input.
If you want to use another audio input device,
stream-translator-gpt device --print_all_devices
get device index and then run the CLI with--device_index {index}
.If you want to use the audio output of another program as input, you need to enable stereo mix.
-
Sending result to Cqhttp:
stream-translator-gpt {URL} --model large --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}
-
Sending result to Discord:
stream-translator-gpt {URL} --model large --language {input_language} --discord_webhook_url {your_discord_webhook_url}
All options
Option | Default Value | Description |
---|---|---|
Input Options | ||
URL |
The URL of the stream. If a local file path is filled in, it will be used as input. If fill in "device", the input will be obtained from your PC device. | |
--format |
wa* | Stream format code, this parameter will be passed directly to yt-dlp. |
--cookies |
Used to open member-only stream, this parameter will be passed directly to yt-dlp. | |
--direct_url |
Set this flag to pass the URL directly to ffmpeg. Otherwise, yt-dlp is used to obtain the stream URL. | |
--device_index |
The index of the device that needs to be recorded. If not set, the system default recording device will be used. | |
Audio Slicing Options | ||
--frame_duration |
0.1 | The unit that processes live streaming data in seconds. |
--continuous_no_speech_threshold |
0.8 | Slice if there is no speech for a continuous period in second. |
--min_audio_length |
3.0 | Minimum slice audio length in seconds. |
--max_audio_length |
30.0 | Maximum slice audio length in seconds. |
--prefix_retention_length |
0.8 | The length of the retention prefix audio during slicing. |
--vad_threshold |
0.5 | The threshold of Voice activity detection. if the speech probability of a frame is higher than this value, then this frame is speech. |
Transcription Options | ||
--model |
small | Select model size. See here for available models. |
--language |
auto | Language spoken in the stream. See here for available languages. |
--beam_size |
5 | Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate). |
--best_of |
5 | Number of candidates when sampling with non-zero temperature. |
--use_faster_whisper |
Set this flag to use Faster Whisper implementation instead of the original OpenAI implementation | |
--use_whisper_api |
Set this flag to use OpenAI Whisper API instead of the original local Whipser. | |
--whisper_filters |
emoji_filter | Filters apply to whisper results, separated by ",". |
Translation Options | ||
--openai_api_key |
OpenAI API key if using GPT translation / Whisper API. | |
--google_api_key |
Google API key if using Gemini translation. | |
--gpt_model |
gpt-3.5-turbo | GPT model name, gpt-3.5-turbo or gpt-4. (If using Gemini, not need to change this) |
--gpt_translation_prompt |
If set, will translate the result text to target language via GPT / Gemini API (According to which API key is filled in). Example: "Translate from Japanese to Chinese" | |
--gpt_translation_history_size |
0 | The number of previous messages sent when calling the GPT / Gemini API. If the history size is 0, the translation will be run parallelly. If the history size > 0, the translation will be run serially. |
--gpt_translation_timeout |
10 | If the GPT / Gemini translation exceeds this number of seconds, the translation will be discarded. |
--gpt_base_url |
Customize the API endpoint of GPT. | |
--retry_if_translation_fails |
Retry when translation times out/fails. Used to generate subtitles offline. | |
Output Options | ||
--output_timestamps |
Output the timestamp of the text when outputting the text. | |
--hide_transcribe_result |
Hide the result of Whisper transcribe. | |
--cqhttp_url |
If set, will send the result text to the cqhttp server. | |
--cqhttp_token |
Token of cqhttp, if it is not set on the server side, it does not need to fill in. | |
--discord_webhook_url |
If set, will send the result text to the discord channel. |
Contact me
Telegram: @ionic_bond
Donate
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file stream-translator-gpt-2024.3.9.dev0.tar.gz
.
File metadata
- Download URL: stream-translator-gpt-2024.3.9.dev0.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 874cc24cc7a6e6236b5cd105e082a762a340ba1454b718868e399f7033134a59 |
|
MD5 | 18e902948a7ed6e8f8dd5fe222e0fec5 |
|
BLAKE2b-256 | f891338e1ce8b5de096e31496a970c7b92a2ea26e5bf3dd8bcb1896baa670dc8 |
File details
Details for the file stream_translator_gpt-2024.3.9.dev0-py3-none-any.whl
.
File metadata
- Download URL: stream_translator_gpt-2024.3.9.dev0-py3-none-any.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a5fabe57aae7e69573e775256ba2483b7113ddbf7ffe527ee19a5cd7211acf7d |
|
MD5 | 47fe6bc2a990d14f51e1ac23cab762cb |
|
BLAKE2b-256 | 3d1fb2287af95c109b053af3f578cdc93b1dc5a00500bf0983fc898e14f7896c |