Transcribe (whisper) and translate (gpt) voice into LRC file.

These details have not been verified by PyPI

Project links

Project description

Open-Lyrics

GitHub Workflow Status (with event)

Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using OpenAI-GPT.

Installation

Please install CUDA 11.x and cuDNN 8 for CUDA 11 first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

faster-whisper also needs cuBLAS for CUDA 11 installed.

For Windows Users (click to expand)

(For Windows Users only) Windows user can Download the libraries from Purfview's repository:

Purfview's whisper-standalone-win provides the required NVIDIA libraries for Windows in a single archive. Decompress the archive and place the libraries in a directory included in the PATH.
Add your OpenAI API key to environment variable OPENAI_API_KEY.

Install PyTorch:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

Install latest fast-whisper

pip install git+https://github.com/guillaumekln/faster-whisper

(Optional) If you want to process videos, install ffmpeg and add bin directory to your PATH.

This project can be installed from PyPI:

pip install openlrc

or install directly from GitHub:

pip install git+https://github.com/zh-plus/Open-Lyrics

Usage

from openlrc import LRCer

if __name__ == '__main__':
    lrcer = LRCer()

    # Single file
    lrcer.run('./data/test.mp3',
              target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

    # Multiple files
    lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
    # Note we run the transcription sequentially, but run the translation concurrently for each file.

    # Path can contain video
    lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
    # Generate translated ./data/test_audio.lrc and ./data/test_video.srt

    # Use context.yaml to improve translation
    lrcer.run('./data/test.mp3', target_lang='zh-cn', context_path='./data/context.yaml')

    # To skip translation process
    lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

    # Change asr_options or vad_options, check openlrc.defaults for details
    vad_options = {"threshold": 0.1}
    lrcer = LRCer(vad_options=vad_options)
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Enhance the audio using noise suppression (consume more time).
    lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

Check more details in Documentation.

Context

Utilize the available context to enhance the quality of your translation. Save them as context.yaml in the same directory as your audio file.

[!NOTE] The improvement of translation quality from Context is NOT guaranteed.

background: "This is a multi-line background.
This is a basic example."
audio_type: Movie
description_map: {
  movie_name1 (without extension): "This
  is a multi-line description for movie1.",
  movie_name2 (without extension): "This
  is a multi-line description for movie2.",
  movie_name3 (without extension): "This is a single-line description for movie 3.",
}

Todo

[Efficiency] Batched translate/polish for GPT request (enable contextual ability).
[Efficiency] Concurrent support for GPT request.
[Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.
[Feature] Automatically fix json encoder error using GPT.
[Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.
[Quality] Improve batched translation/polish prompt according to gpt-subtrans.
[Feature] Input video support.
[Feature] Multiple output format support.
[Quality] Speech enhancement for input audio.
[Feature] Preprocessor: Voice-music separation.
[Feature] Align ground-truth transcription with audio.
[Quality] Use multilingual language model to assess translation quality.
[Efficiency] Add Azure OpenAI Service support.
[Quality] Use claude for translation.
[Feature] Add local LLM support.
[Feature] Multiple translate engine (Microsoft, DeepL, Google, etc.) support.
[Feature] Build a electron + fastapi GUI for cross-platform application.
Add fine-tuned whisper-large-v2 models for common languages.
[Others] Add transcribed examples.
- Song
- Podcast
- Audiobook

Credits

Star History

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.2

Sep 10, 2024

1.5.1

Jul 1, 2024

1.5.0

Jun 26, 2024

1.4.1

Jun 17, 2024

1.4.0

Jun 5, 2024

1.3.1

May 7, 2024

1.3.0

Apr 9, 2024

1.2.0

Mar 29, 2024

1.1.0

Feb 29, 2024

1.0.5

Jan 29, 2024

1.0.4

Jan 29, 2024

1.0.3

Jan 18, 2024

This version

1.0.2

Jan 12, 2024

1.0.1

Dec 24, 2023

1.0.0

Dec 14, 2023

0.2.3

Aug 10, 2023

0.2.2

Jul 24, 2023

0.2.1

Jul 17, 2023

0.2.0

Jul 12, 2023

0.1.5

Jul 7, 2023

0.1.4

Jul 6, 2023

0.1.3

Jun 27, 2023

0.1.2

Jun 22, 2023

0.1.1

Jun 16, 2023

0.1.0

Jun 15, 2023

0.0.6

Jun 11, 2023

0.0.5

Jun 10, 2023

0.0.4

Jun 9, 2023

0.0.3

Jun 9, 2023

0.0.1

Jun 8, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlrc-1.0.2.tar.gz (27.2 kB view hashes)

Uploaded Jan 12, 2024 Source

Built Distribution

openlrc-1.0.2-py3-none-any.whl (31.7 kB view hashes)

Uploaded Jan 12, 2024 Python 3

Hashes for openlrc-1.0.2.tar.gz

Hashes for openlrc-1.0.2.tar.gz
Algorithm	Hash digest
SHA256	`dbe4a409459ecf45eac109ab513d5197daeda0a007363fd1842150083ce1615b`
MD5	`4e7d6251316d8d6a1362cf73ffa37b0d`
BLAKE2b-256	`6a6ce98231f318d9bbe63ae9c4cfe4052d4264bb340e76103b4dac418792f904`

Hashes for openlrc-1.0.2-py3-none-any.whl

Hashes for openlrc-1.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1c69034474df1db15113f66f2afcf771282ed8f178c83411851a8582b0604c03`
MD5	`4a90db86b993dffbfb5f5ccfcf7c0d2a`
BLAKE2b-256	`76669cdde90894b25d7f367e53d03ec368cbe6d3bc9a25d7d4fc7ff7fbfccc2e`