Skip to main content

Transcribe (whisper) and translate (gpt) voice into LRC file.

Project description

Open-Lyrics

PyPI PyPI - License Downloads GitHub Workflow Status (with event)

Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using LLM, e.g. OpenAI-GPT, Anthropic-Claude.

Key Features:

  • Well preprocessed audio to reduce hallucination (Loudness Norm & optional Noise Suppression).
  • Context-aware translation to improve translation quality. Check prompt for details.
  • Check here for an overview of the architecture.

New 🚨

  • 2024.5.7:
    • Add custom endpoint (base_url) support for OpenAI & Anthropic:
      lrcer = LRCer(base_url_config={'openai': 'https://api.chatanywhere.tech',
                                     'anthropic': 'https://example/api'})
      
    • Generating bilingual subtitles
      lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)
      
  • 2024.5.11: Add glossary into prompt, which is confirmed to improve domain specific translation. Check here for details.
  • 2024.5.17: You can route model to arbitrary Chatbot SDK (either OpenAI or Anthropic) by setting chatbot_model to provider: model_name together with base_url_config:
    lrcer = LRCer(chatbot_model='openai: claude-3-haiku-20240307',
                  base_url_config={'openai': 'https://api.g4f.icu/v1/'})
    
  • 2024.6.25: Support Gemini as translation engine LLM, try using gemini-1.5-flash:
    lrcer = LRCer(chatbot_model='gemini-1.5-flash')
    
  • 2024.9.10: Now openlrc depends on a specific commit of faster-whisper, which is not published on PyPI. Install it from source:
    pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
    
  • 2024.12.19: Add ModelConfig for chat model routing, which is more flexible than model name string, The ModelConfig can be ModelConfig(provider='', model_name='', base_url='', proxy=''), e.g.:
    from openlrc import LRCer, ModelConfig, ModelProvider
    
    chatbot_model1 = ModelConfig(
        provider=ModelProvider.OPENAI, 
        name='deepseek-chat', 
        base_url='https://api.deepseek.com/beta', 
        api_key='sk-APIKEY'
    )
    chatbot_model2 = ModelConfig(
        provider=ModelProvider.OPENAI, 
        name='gpt-4o-mini', 
        api_key='sk-APIKEY'
    )
    lrcer = LRCer(chatbot_model=chatbot_model1, retry_model=chatbot_model2)
    

Installation ⚙️

  1. Please install CUDA 11.x and cuDNN 8 for CUDA 11 first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

    faster-whisper also needs cuBLAS for CUDA 11 installed.

    For Windows Users (click to expand)

    (For Windows Users only) Windows user can Download the libraries from Purfview's repository:

    Purfview's whisper-standalone-win provides the required NVIDIA libraries for Windows in a single archive. Decompress the archive and place the libraries in a directory included in the PATH.

  2. Add LLM API keys, you can either:

  3. Install ffmpeg and add bin directory to your PATH.

  4. This project can be installed from PyPI:

    pip install openlrc
    

    or install directly from GitHub:

    pip install git+https://github.com/zh-plus/openlrc
    
  5. Install latest fast-whisper from source:

    pip install "faster-whisper @ https://github.com/SYSTRAN/faster-whisper/archive/8327d8cc647266ed66f6cd878cf97eccface7351.tar.gz"
    
  6. Install PyTorch:

    pip install --force-reinstall torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
    
  7. Fix the typing-extensions issue:

    pip install typing-extensions -U
    

Usage 🐍

Python code

from openlrc import LRCer

if __name__ == '__main__':
    lrcer = LRCer()

    # Single file
    lrcer.run('./data/test.mp3',
              target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

    # Multiple files
    lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
    # Note we run the transcription sequentially, but run the translation concurrently for each file.

    # Path can contain video
    lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
    # Generate translated ./data/test_audio.lrc and ./data/test_video.srt

    # Use glossary to improve translation
    lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')

    # To skip translation process
    lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

    # Change asr_options or vad_options, check openlrc.defaults for details
    vad_options = {"threshold": 0.1}
    lrcer = LRCer(vad_options=vad_options)
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Enhance the audio using noise suppression (consume more time).
    lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

    # Change the LLM model for translation
    lrcer = LRCer(chatbot_model='claude-3-sonnet-20240229')
    lrcer.run('./data/test.mp3', target_lang='zh-cn')

    # Clear temp folder after processing done
    lrcer.run('./data/test.mp3', target_lang='zh-cn', clear_temp=True)

    # Change base_url
    lrcer = LRCer(base_url_config={'openai': 'https://api.g4f.icu/v1',
                                   'anthropic': 'https://example/api'})

    # Route model to arbitrary Chatbot SDK
    lrcer = LRCer(chatbot_model='openai: claude-3-sonnet-20240229',
                  base_url_config={'openai': 'https://api.g4f.icu/v1/'})

    # Bilingual subtitle
    lrcer.run('./data/test.mp3', target_lang='zh-cn', bilingual_sub=True)

Check more details in Documentation.

Glossary

Add glossary to improve domain specific translation. For example aoe4-glossary.yaml:

{
  "aoe4": "帝国时代4",
  "feudal": "封建时代",
  "2TC": "双TC",
  "English": "英格兰文明",
  "scout": "侦察兵"
}
lrcer = LRCer(glossary='./data/aoe4-glossary.yaml')
lrcer.run('./data/test.mp3', target_lang='zh-cn')

or directly use dictionary to add glossary:

lrcer = LRCer(glossary={"aoe4": "帝国时代4", "feudal": "封建时代"})
lrcer.run('./data/test.mp3', target_lang='zh-cn')

Pricing 💰

pricing data from OpenAI and Anthropic

Model Name Pricing for 1M Tokens
(Input/Output) (USD)
Cost for 1 Hour Audio
(USD)
gpt-3.5-turbo 0.5, 1.5 0.01
gpt-4o-mini 0.5, 1.5 0.01
gpt-4-0125-preview 10, 30 0.5
gpt-4-turbo-preview 10, 30 0.5
gpt-4o 5, 15 0.25
claude-3-haiku-20240307 0.25, 1.25 0.015
claude-3-sonnet-20240229 3, 15 0.2
claude-3-opus-20240229 15, 75 1
claude-3-5-sonnet-20240620 3, 15 0.2
gemini-1.5-flash 0.175, 2.1 0.01
gemini-1.0-pro 0.5, 1.5 0.01
gemini-1.5-pro 1.75, 21 0.1
deepseek-chat 0.18, 2.2 0.01

Note the cost is estimated based on the token count of the input and output text. The actual cost may vary due to the language and audio speed.

Recommended translation model

For english audio, we recommend using deepseek-chat, gpt-4o-mini or gemini-1.5-flash.

For non-english audio, we recommend using claude-3-5-sonnet-20240620.

How it works

To maintain context between translation segments, the process is sequential for each audio file.

Development Guide

I'm using uv for package management. Install uv with our standalone installers:

On macOS and Linux.

curl -LsSf https://astral.sh/uv/install.sh | sh

On Windows.

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Install deps

uv venv
uv sync

Todo

  • [Efficiency] Batched translate/polish for GPT request (enable contextual ability).
  • [Efficiency] Concurrent support for GPT request.
  • [Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.
  • [Feature] Automatically fix json encoder error using GPT.
  • [Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.
  • [Quality] Improve batched translation/polish prompt according to gpt-subtrans.
  • [Feature] Input video support.
  • [Feature] Multiple output format support.
  • [Quality] Speech enhancement for input audio.
  • [Feature] Preprocessor: Voice-music separation.
  • [Feature] Align ground-truth transcription with audio.
  • [Quality] Use multilingual language model to assess translation quality.
  • [Efficiency] Add Azure OpenAI Service support.
  • [Quality] Use claude for translation.
  • [Feature] Add local LLM support.
  • [Feature] Multiple translate engine (Anthropic, Microsoft, DeepL, Google, etc.) support.
  • [Feature] Build a electron + fastapi GUI for cross-platform application.
  • [Feature] Web-based streamlit GUI.
  • Add fine-tuned whisper-large-v2 models for common languages.
  • [Feature] Add custom OpenAI & Anthropic endpoint support.
  • [Feature] Add local translation model support (e.g. SakuraLLM).
  • [Quality] Construct translation quality benchmark test for each patch.
  • [Quality] Split subtitles using LLM (ref).
  • [Quality] Trim extra long subtitle using LLM (ref).
  • [Others] Add transcribed examples.
    • Song
    • Podcast
    • Audiobook

Credits

Star History

Star History Chart

Citation

@book{openlrc2024zh,
	title = {zh-plus/openlrc},
	url = {https://github.com/zh-plus/openlrc},
	author = {Hao, Zheng},
	date = {2024-09-10},
	year = {2024},
	month = {9},
	day = {10},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlrc-1.6.1.tar.gz (51.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openlrc-1.6.1-py3-none-any.whl (58.6 kB view details)

Uploaded Python 3

File details

Details for the file openlrc-1.6.1.tar.gz.

File metadata

  • Download URL: openlrc-1.6.1.tar.gz
  • Upload date:
  • Size: 51.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for openlrc-1.6.1.tar.gz
Algorithm Hash digest
SHA256 cd901b1c1952059d138b58821c9b4b1c0a3da3451bfdb402ea8bc665706a78bc
MD5 f5d47b69733c98a76686a05d4929810e
BLAKE2b-256 de1412124204869d7dc5ddbe1fc2fe9b560039b79906327b85c4e8255b2523ca

See more details on using hashes here.

File details

Details for the file openlrc-1.6.1-py3-none-any.whl.

File metadata

  • Download URL: openlrc-1.6.1-py3-none-any.whl
  • Upload date:
  • Size: 58.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.16

File hashes

Hashes for openlrc-1.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2f5b02d4c9c551219387ee7a22bdca92c0514c8e5ee0bba55586963a25ebf2c6
MD5 57e18cd8eedd7fa48e1a08c730439ff6
BLAKE2b-256 5059489a89d5855ce14dae6848c683f0c7d179bdcdbe535e0f966976979c735a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page