Command line tool to transcribe & translate audio from livestreams in real time

These details have not been verified by PyPI

Project links

Project description

stream-translator-gpt

Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and OpenAI's whisper for transcription/translation.

This fork optimized the audio slicing logic based on VAD, introduced OpenAI's GPT API / Google's Gemini API to support language translation beyond English, and supports getting audio from the devices.

Sample:

Prerequisites

Install and add ffmpeg to your PATH
Install CUDA on your system. You can check the installed CUDA version with nvcc --version.

Setup

Setup a virtual environment.
git clone https://github.com/ionic-bond/stream-translator-gpt
pip install -r requirements.txt
Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.

Usage

Translate live streaming audio:

python translator.py {URL} {flags...}

By default, the URL can be of the form twitch.tv/forsen and yt-dlp is used to obtain the .m3u8 link which is passed to ffmpeg.
Translate PC device audio:

python translator.py device {flags...}

Will use the system's default audio device as input.

If need to use another audio input device, python print_all_devices.py get device index and run the CLI with --device_index.

Flags

--flags	Default Value	Description
`URL`		The URL of the stream. If fill in "device", the audio will be obtained from your PC device.
`--format`	wa*	Stream format code, this parameter will be passed directly to yt-dlp.
`--cookies`		Used to open member-only stream, this parameter will be passed directly to yt-dlp.
`--device_index`		The index of the device that needs to be recorded. If not set, the system default recording device will be used.
`--frame_duration`	0.1	The unit that processes live streaming data in seconds.
`--continuous_no_speech_threshold`	0.8	Slice if there is no speech for a continuous period in second.
`--min_audio_length`	3.0	Minimum slice audio length in seconds.
`--max_audio_length`	30.0	Maximum slice audio length in seconds.
`--prefix_retention_length`	0.8	The length of the retention prefix audio during slicing.
`--vad_threshold`	0.5	The threshold of Voice activity detection. if the speech probability of a frame is higher than this value, then this frame is speech.
`--model`	small	Select model size. See here for available models.
`--task`	translate	Whether to transcribe the audio (keep original language) or translate to english.
`--language`	auto	Language spoken in the stream. See here for available languages.
`--beam_size`	5	Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate).
`--best_of`	5	Number of candidates when sampling with non-zero temperature.
`--direct_url`		Set this flag to pass the URL directly to ffmpeg. Otherwise, yt-dlp is used to obtain the stream URL.
`--use_faster_whisper`		Set this flag to use faster_whisper implementation instead of the original OpenAI implementation
`--use_whisper_api`		Set this flag to use OpenAI Whisper API instead of the original local Whipser.
`--whisper_filters`	emoji_filter	Filters apply to whisper results, separated by ",".
`--hide_whisper_result`		Hide the result of Whisper transcribe.
`--openai_api_key`		OpenAI API key if using GPT translation / Whisper API.
`--google_api_key`		Google API key if using Gemini translation.
`--gpt_model`	gpt-3.5-turbo	GPT model name, gpt-3.5-turbo or gpt-4. (If using Gemini, not need to change this)
`--gpt_translation_prompt`		If set, will translate the result text to target language via GPT / Gemini API (According to which API key is filled in). Example: "Translate from Japanese to Chinese"
`--gpt_translation_history_size`	0	The number of previous messages sent when calling the GPT / Gemini API. If the history size is 0, the translation will be run parallelly. If the history size > 0, the translation will be run serially.
`--gpt_translation_timeout`	15	If the GPT / Gemini translation exceeds this number of seconds, the translation will be discarded.
`--gpt_base_url`	`https://api.openai.com/v1/`	Customize the API endpoint of chatgpt
`--retry_if_translation_fails`		Retry when translation times out/fails. Used to generate subtitles offline.
`--output_timestamps`		Output the timestamp of the text when outputting the text.
`--cqhttp_url`		If set, will send the result text to the cqhttp server.
`--cqhttp_token`		Token of cqhttp, if it is not set on the server side, it does not need to fill in.

Using faster-whisper

faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory). To use it, install the cuDNN to your CUDA dir, Then you can run the CLI with --use_faster_whisper.

Contact me

Telegram: @ionic_bond

Donate

PayPal

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2026.3.2

Mar 2, 2026

2026.3.1

Feb 28, 2026

2026.2.26

Feb 26, 2026

2026.1.31

Jan 30, 2026

2026.1.29

Jan 28, 2026

2026.1.12

Jan 12, 2026

2025.12.31.1

Dec 30, 2025

2025.12.31

Dec 30, 2025

2025.12.30.2

Dec 30, 2025

2025.12.30.1

Dec 30, 2025

2025.12.30

Dec 30, 2025

2025.12.25

Dec 25, 2025

2025.12.24

Dec 24, 2025

2025.12.19

Dec 19, 2025

2025.12.16.1

Dec 16, 2025

2025.12.16

Dec 16, 2025

2025.12.16.dev2 pre-release

Dec 16, 2025

2025.12.16.dev1 pre-release

Dec 16, 2025

2025.12.16.dev0 pre-release

Dec 16, 2025

2025.12.8

Dec 7, 2025

2025.12.1

Nov 30, 2025

2025.11.9

Nov 9, 2025

2025.10.30

Oct 29, 2025

2025.10.29.2

Oct 29, 2025

2025.8.10

Aug 9, 2025

2025.7.28

Jul 27, 2025

2025.5.14

May 13, 2025

2025.5.13

May 12, 2025

2025.2.8

Feb 7, 2025

2025.1.13

Jan 13, 2025

2024.12.24

Dec 24, 2024

2024.12.17

Dec 17, 2024

2024.12.11

Dec 11, 2024

2024.12.4

Dec 4, 2024

2024.12.4.dev1 pre-release

Dec 4, 2024

2024.12.4.dev0 pre-release

Dec 4, 2024

2024.11.11

Nov 11, 2024

2024.10.11

Oct 11, 2024

2024.9.20

Sep 20, 2024

2024.8.20

Aug 20, 2024

2024.8.19

Aug 18, 2024

2024.8.17

Aug 17, 2024

2024.7.19

Jul 19, 2024

2024.5.28

May 27, 2024

2024.5.25

May 24, 2024

2024.5.4

May 3, 2024

2024.4.24

Apr 24, 2024

2024.3.26

Mar 26, 2024

2024.3.25

Mar 24, 2024

2024.3.22

Mar 22, 2024

2024.3.9

Mar 9, 2024

2024.3.9.dev2 pre-release

Mar 9, 2024

2024.3.9.dev1 pre-release

Mar 9, 2024

2024.3.9.dev0 pre-release

Mar 9, 2024

2024.3.6

Mar 5, 2024

2024.3.3

Mar 3, 2024

2024.3.3.dev3 pre-release

Mar 3, 2024

This version

2024.3.3.dev2 pre-release

Mar 3, 2024

2024.3.3.dev1 pre-release

Mar 3, 2024

2024.3.3.dev0 pre-release

Mar 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream-translator-gpt-2024.3.3.dev2.tar.gz (16.9 kB view details)

Uploaded Mar 3, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stream_translator_gpt-2024.3.3.dev2-py3-none-any.whl (17.7 kB view details)

Uploaded Mar 3, 2024 Python 3

File details

Details for the file stream-translator-gpt-2024.3.3.dev2.tar.gz.

File metadata

Download URL: stream-translator-gpt-2024.3.3.dev2.tar.gz
Upload date: Mar 3, 2024
Size: 16.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for stream-translator-gpt-2024.3.3.dev2.tar.gz
Algorithm	Hash digest
SHA256	`bfc11fe5f27d8602d8bf62b4b9d3cf2964270f80d7e4b61317441306b4d7a005`
MD5	`b943ac58d9b95ecb3ff87e98673061b0`
BLAKE2b-256	`b370872c67dceb5b4d1f2ca889d816d0ed72d07d854759621aab344e04e1ef13`

See more details on using hashes here.

File details

Details for the file stream_translator_gpt-2024.3.3.dev2-py3-none-any.whl.

File metadata

Download URL: stream_translator_gpt-2024.3.3.dev2-py3-none-any.whl
Upload date: Mar 3, 2024
Size: 17.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for stream_translator_gpt-2024.3.3.dev2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7de05d9c402bf285ce02d83ce7eab970e15c27509faeb544233882d64c6fadd4`
MD5	`dba7b4782f90fcc2a6a86e0756cc6f2f`
BLAKE2b-256	`989feac6dd0d198ddd48980fcd224e1565bc9c9e7261c8933b96232d9b3a3617`

See more details on using hashes here.

stream-translator-gpt 2024.3.3.dev2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

stream-translator-gpt

Prerequisites

Setup

Usage

Flags

Using faster-whisper

Contact me

Donate

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes