Skip to main content

Command line tool to transcribe & translate audio from livestreams in real time

Project description

stream-translator-gpt

Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and OpenAI's whisper for transcription/translation.

This fork optimized the audio slicing logic based on VAD, introduced OpenAI's GPT API / Google's Gemini API to support language translation beyond English, and supports getting audio from the devices.

Sample: Open In Colab

Prerequisites

  1. Install and add ffmpeg to your PATH
  2. Install CUDA on your system. You can check the installed CUDA version with nvcc --version.

Setup

  1. Setup a virtual environment.
  2. git clone https://github.com/ionic-bond/stream-translator-gpt
  3. pip install -r requirements.txt
  4. Make sure that pytorch is installed with CUDA support. Whisper will probably not run in real time on a CPU.

Usage

  1. Translate live streaming audio:

    python translator.py {URL} {flags...}

    By default, the URL can be of the form twitch.tv/forsen and yt-dlp is used to obtain the .m3u8 link which is passed to ffmpeg.

  2. Translate PC device audio:

    python translator.py device {flags...}

    Will use the system's default audio device as input.

    If need to use another audio input device, python print_all_devices.py get device index and run the CLI with --device_index.

Flags

--flags Default Value Description
URL The URL of the stream. If fill in "device", the audio will be obtained from your PC device.
--format wa* Stream format code, this parameter will be passed directly to yt-dlp.
--cookies Used to open member-only stream, this parameter will be passed directly to yt-dlp.
--device_index The index of the device that needs to be recorded. If not set, the system default recording device will be used.
--frame_duration 0.1 The unit that processes live streaming data in seconds.
--continuous_no_speech_threshold 0.8 Slice if there is no speech for a continuous period in second.
--min_audio_length 3.0 Minimum slice audio length in seconds.
--max_audio_length 30.0 Maximum slice audio length in seconds.
--prefix_retention_length 0.8 The length of the retention prefix audio during slicing.
--vad_threshold 0.5 The threshold of Voice activity detection. if the speech probability of a frame is higher than this value, then this frame is speech.
--model small Select model size. See here for available models.
--task translate Whether to transcribe the audio (keep original language) or translate to english.
--language auto Language spoken in the stream. See here for available languages.
--beam_size 5 Number of beams in beam search. Set to 0 to use greedy algorithm instead (faster but less accurate).
--best_of 5 Number of candidates when sampling with non-zero temperature.
--direct_url Set this flag to pass the URL directly to ffmpeg. Otherwise, yt-dlp is used to obtain the stream URL.
--use_faster_whisper Set this flag to use faster_whisper implementation instead of the original OpenAI implementation
--use_whisper_api Set this flag to use OpenAI Whisper API instead of the original local Whipser.
--whisper_filters emoji_filter Filters apply to whisper results, separated by ",".
--hide_whisper_result Hide the result of Whisper transcribe.
--openai_api_key OpenAI API key if using GPT translation / Whisper API.
--google_api_key Google API key if using Gemini translation.
--gpt_model gpt-3.5-turbo GPT model name, gpt-3.5-turbo or gpt-4. (If using Gemini, not need to change this)
--gpt_translation_prompt If set, will translate the result text to target language via GPT / Gemini API (According to which API key is filled in). Example: "Translate from Japanese to Chinese"
--gpt_translation_history_size 0 The number of previous messages sent when calling the GPT / Gemini API. If the history size is 0, the translation will be run parallelly. If the history size > 0, the translation will be run serially.
--gpt_translation_timeout 15 If the GPT / Gemini translation exceeds this number of seconds, the translation will be discarded.
--gpt_base_url https://api.openai.com/v1/ Customize the API endpoint of chatgpt
--retry_if_translation_fails Retry when translation times out/fails. Used to generate subtitles offline.
--output_timestamps Output the timestamp of the text when outputting the text.
--cqhttp_url If set, will send the result text to the cqhttp server.
--cqhttp_token Token of cqhttp, if it is not set on the server side, it does not need to fill in.

Using faster-whisper

faster-whisper provides significant performance upgrades over the original OpenAI implementation (~ 4x faster, ~ 2x less memory). To use it, install the cuDNN to your CUDA dir, Then you can run the CLI with --use_faster_whisper.

Contact me

Telegram: @ionic_bond

Donate

PayPal

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream-translator-gpt-2024.3.3.dev2.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file stream-translator-gpt-2024.3.3.dev2.tar.gz.

File metadata

File hashes

Hashes for stream-translator-gpt-2024.3.3.dev2.tar.gz
Algorithm Hash digest
SHA256 bfc11fe5f27d8602d8bf62b4b9d3cf2964270f80d7e4b61317441306b4d7a005
MD5 b943ac58d9b95ecb3ff87e98673061b0
BLAKE2b-256 b370872c67dceb5b4d1f2ca889d816d0ed72d07d854759621aab344e04e1ef13

See more details on using hashes here.

File details

Details for the file stream_translator_gpt-2024.3.3.dev2-py3-none-any.whl.

File metadata

File hashes

Hashes for stream_translator_gpt-2024.3.3.dev2-py3-none-any.whl
Algorithm Hash digest
SHA256 7de05d9c402bf285ce02d83ce7eab970e15c27509faeb544233882d64c6fadd4
MD5 dba7b4782f90fcc2a6a86e0756cc6f2f
BLAKE2b-256 989feac6dd0d198ddd48980fcd224e1565bc9c9e7261c8933b96232d9b3a3617

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page