Skip to main content

Command line tool to transcribe & translate audio from livestreams in real time

Project description

stream-translator-gpt

Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and Whisper / Faster-Whisper for transcription.

This fork optimized the audio slicing logic based on VAD, introduced GPT API / Gemini API to support language translation beyond English, and supports input from the audio devices.

Try it on Colab: Open In Colab

Prerequisites

Linux or Windows:

  1. Python >= 3.8 (Recommend >= 3.10)
  2. Install CUDA on your system..
  3. Install cuDNN v8 to your CUDA dir if you want to use Faster-Whisper.
  4. Install PyTorch (with CUDA) to your Python.
  5. Create a Google API key if you want to use Gemini API for translation. (Free 15 requests / minute)
  6. Create a OpenAI API key if you want to use Whisper API for transcription or GPT API for translation.

If you are in Windows, you also need to:

  1. Install and add ffmpeg to your PATH.
  2. Install yt-dlp and add it to your PATH.

Installation

Install release version from PyPI (Recommend):

pip install stream-translator-gpt -U
stream-translator-gpt

or

Clone master version code from Github:

git clone https://github.com/ionic-bond/stream-translator-gpt.git
pip install -r ./stream-translator-gpt/requirements.txt
python3 ./stream-translator-gpt/translator.py

Usage

  • Transcribe live streaming (default use Whisper):

    stream-translator-gpt {URL} --model large --language {input_language}

  • Transcribe by Faster Whisper:

    stream-translator-gpt {URL} --model large --language {input_language} --use_faster_whisper

  • Transcribe by Whisper API:

    stream-translator-gpt {URL} --language {input_language} --use_whisper_api --openai_api_key {your_openai_key}

  • Translate to other language by Gemini:

    stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}

  • Translate to other language by GPT:

    stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --openai_api_key {your_openai_key}

  • Using Whisper API and Gemini at the same time:

    stream-translator-gpt {URL} --model large --language ja --use_whisper_api --openai_api_key {your_openai_key} --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}

  • Local video/audio file as input:

    stream-translator-gpt /path/to/file --model large --language {input_language}

  • Computer microphone as input:

    stream-translator-gpt device --model large --language {input_language}

    Will use the system's default audio device as input.

    If you want to use another audio input device, stream-translator-gpt device --print_all_devices get device index and then run the CLI with --device_index {index}.

    If you want to use the audio output of another program as input, you need to enable stereo mix.

  • Sending result to Cqhttp:

    stream-translator-gpt {URL} --model large --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}

  • Sending result to Discord:

    stream-translator-gpt {URL} --model large --language {input_language} --discord_webhook_url {your_discord_webhook_url}

  • Saving result to a .srt subtitle file:

    stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key} --hide_transcribe_result --output_timestamps --output_file_path ./result.srt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream_translator_gpt-2024.8.20.tar.gz (1.1 MB view details)

Uploaded Source

Built Distribution

File details

Details for the file stream_translator_gpt-2024.8.20.tar.gz.

File metadata

File hashes

Hashes for stream_translator_gpt-2024.8.20.tar.gz
Algorithm Hash digest
SHA256 88eae39524c8227555ecd77f36a93e0ec691639f97c7af0af64e0c3427a6416d
MD5 dab89fcf681b536885864a8d9902ea6d
BLAKE2b-256 bcebff9ea03a1df02496e3d1ff0c0c70b89d3316614882c172bfc128d6c46e0c

See more details on using hashes here.

File details

Details for the file stream_translator_gpt-2024.8.20-py3-none-any.whl.

File metadata

File hashes

Hashes for stream_translator_gpt-2024.8.20-py3-none-any.whl
Algorithm Hash digest
SHA256 9269e25efa687d5fa016446bfcf09c8a4f957e1dfbf32351f1154f48c2568750
MD5 a7fdda1c35a499eec10002bfdac83659
BLAKE2b-256 2e254618a4681e048c7aa4a52442935adb2f823d18527406f573cee7877894dd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page