Command line tool to transcribe & translate audio from livestreams in real time

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

stream-translator-gpt

Command line utility to transcribe or translate audio from livestreams in real time. Uses yt-dlp to get livestream URLs from various services and Whisper / Faster-Whisper for transcription.

This fork optimized the audio slicing logic based on VAD, introduced GPT API / Gemini API to support language translation beyond English, and supports input from the audio devices.

Try it on Colab:

Prerequisites

Linux or Windows:

Python >= 3.8 (Recommend >= 3.10)
Install CUDA 11 on your system. (Faster-Whisper is not compatible with CUDA 12 for now).
Install cuDNN to your CUDA dir if you want to use Faseter-Whisper.
Install PyTorch (with CUDA) to your Python.
Create a Google API key if you want to use Gemini API for translation. (Recommend, Free 60 requests / minute)
Create a OpenAI API key if you want to use Whisper API for transcription or GPT API for translation.

If you are in Windows, you also need to:

Install and add ffmpeg to your PATH.
Install yt-dlp and add it to your PATH.

Installation

Install release version from PyPI (Recommend):

pip install stream-translator-gpt
stream-translator-gpt

Clone master version code from Github:

git clone https://github.com/ionic-bond/stream-translator-gpt.git
pip install -r ./stream-translator-gpt/requirements.txt
python3 ./stream-translator-gpt/translator.py

Usage

Transcribe live streaming (default use Whisper):

stream-translator-gpt {URL} --model large --language {input_language}
Transcribe by Faster Whisper:

stream-translator-gpt {URL} --model large --language {input_language} --use_faster_whisper
Transcribe by Whisper API:

stream-translator-gpt {URL} --language {input_language} --use_whisper_api --openai_api_key {your_openai_key}
Translate to other language by Gemini:

stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
Translate to other language by GPT:

stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --openai_api_key {your_openai_key}
Using Whisper API and Gemini at the same time:

stream-translator-gpt {URL} --model large --language ja --use_whisper_api --openai_api_key {your_openai_key} --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key}
Local video/audio file as input:

stream-translator-gpt /path/to/file --model large --language {input_language}
Computer microphone as input:

stream-translator-gpt device --model large --language {input_language}

Will use the system's default audio device as input.

If you want to use another audio input device, stream-translator-gpt device --print_all_devices get device index and then run the CLI with --device_index {index}.

If you want to use the audio output of another program as input, you need to enable stereo mix.
Sending result to Cqhttp:

stream-translator-gpt {URL} --model large --language {input_language} --cqhttp_url {your_cqhttp_url} --cqhttp_token {your_cqhttp_token}
Sending result to Discord:

stream-translator-gpt {URL} --model large --language {input_language} --discord_webhook_url {your_discord_webhook_url}
Saving result to a .srt subtitle file:

stream-translator-gpt {URL} --model large --language ja --gpt_translation_prompt "Translate from Japanese to Chinese" --google_api_key {your_google_key} --hide_transcribe_result --output_timestamps --output_file_path ./result.srt

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

2024.5.4

May 3, 2024

2024.4.24

Apr 24, 2024

2024.3.26

Mar 26, 2024

2024.3.25

Mar 24, 2024

2024.3.22

Mar 22, 2024

2024.3.9

Mar 9, 2024

2024.3.9.dev2 pre-release

Mar 9, 2024

2024.3.9.dev1 pre-release

Mar 9, 2024

2024.3.9.dev0 pre-release

Mar 9, 2024

2024.3.6

Mar 5, 2024

2024.3.3

Mar 3, 2024

2024.3.3.dev3 pre-release

Mar 3, 2024

2024.3.3.dev2 pre-release

Mar 3, 2024

2024.3.3.dev1 pre-release

Mar 3, 2024

2024.3.3.dev0 pre-release

Mar 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream_translator_gpt-2024.5.4.tar.gz (1.1 MB view hashes)

Uploaded May 3, 2024 Source

Built Distribution

stream_translator_gpt-2024.5.4-py3-none-any.whl (1.1 MB view hashes)

Uploaded May 3, 2024 Python 3

Hashes for stream_translator_gpt-2024.5.4.tar.gz

Hashes for stream_translator_gpt-2024.5.4.tar.gz
Algorithm	Hash digest
SHA256	`d04236a9667040010420212c93ed8a4a547eab49423f3ff406e1677a9a77164e`
MD5	`fe1bf2cf377cdefbd703223e15261f73`
BLAKE2b-256	`4be4f5db59bd23b65256ddb7a6574d2a586614374a54c9c5de00eb0b4414a8fd`

Hashes for stream_translator_gpt-2024.5.4-py3-none-any.whl

Hashes for stream_translator_gpt-2024.5.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f673f0370f5ebdb3f8e1411d7728935966ef962ef937743fe082fb4b892fb508`
MD5	`f180af879744c25b1c452b4b98117f4b`
BLAKE2b-256	`f8f0fa03fa15446af70f7faf034498dc81916943ebb271f950bfe23fc6dc9945`