Skip to main content

Transcribe (whisper) and translate (gpt) voice into LRC file.

Project description

Open-Lyrics

PyPI PyPI - License Downloads GitHub Workflow Status (with event)

Open-Lyrics is a Python library that transcribes voice files using faster-whisper, and translates/polishes the resulting text into .lrc files in the desired language using OpenAI-GPT.

Installation

  1. Please install CUDA and cuDNN first according to https://opennmt.net/CTranslate2/installation.html to enable faster-whisper.

  2. Add your OpenAI API key to environment variable OPENAI_API_KEY.

  3. Install PyTorch:

    pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
    
  4. Install latest fast-whisper

    pip install git+https://github.com/guillaumekln/faster-whisper
    
  5. (Optional) If you want to process videos, install ffmpeg and add bin directory to your PATH.

  6. This project can be installed from PyPI:

    pip install openlrc
    

    or install directly from GitHub:

    pip install git+https://github.com/zh-plus/Open-Lyrics
    
  7. Go to Spacy to install the required package using spacy download xxx. For example, you need spacy download ja_core_news_sm if the source language is japanese.

Usage

from openlrc import LRCer

lrcer = LRCer()

# Single file
lrcer.run('./data/test.mp3', target_lang='zh-cn')  # Generate translated ./data/test.lrc with default translate prompt.

# Multiple files
lrcer.run(['./data/test1.mp3', './data/test2.mp3'], target_lang='zh-cn')
# Note we run the transcription sequentially, but run the translation concurrently for each file.

# Path can contain video
lrcer.run(['./data/test_audio.mp3', './data/test_video.mp4'], target_lang='zh-cn')
# Generate translated ./data/test_audio.lrc and ./data/test_video.srt

# Use context.yaml to improve translation
lrcer.run('./data/test.mp3', target_lang='zh-cn', context_path='./data/context.yaml')

# To skip translation process
lrcer.run('./data/test.mp3', target_lang='en', skip_trans=True)

# Change asr_options or vad_options, check openlrc.defaults for details
vad_options = {"threshold": 0.1}
lrcer = LRCer(vad_options=vad_options)
lrcer.run('./data/test.mp3', target_lang='zh-cn')

# Enhance the audio using noise suppression (consume more time).
lrcer.run('./data/test.mp3', target_lang='zh-cn', noise_suppress=True)

Context

Utilize the available context to enhance the quality of your translation. Save them as context.yaml in the same directory as your audio file.

background: "This is a multi-line background.
This is a basic example."
audio_type: Movie
description_map: {
  movie_name1 (without extension): "This
  is a multi-line description for movie1.",
  movie_name2 (without extension): "This
  is a multi-line description for movie2.",
  movie_name3 (without extension): "This is a single-line description for movie 3.",
}

Todo

  • [Efficiency] Batched translate/polish for GPT request (enable contextual ability).
  • [Efficiency] Concurrent support for GPT request.
  • [Translation Quality] Make translate prompt more robust according to https://github.com/openai/openai-cookbook.
  • [Feature] Automatically fix json encoder error using GPT.
  • [Efficiency] Asynchronously perform transcription and translation for multiple audio inputs.
  • [Quality] Improve batched translation/polish prompt according to gpt-subtrans.
  • [Feature] Input video support.
  • [Feature] Multiple output format support.
  • [Quality] Speech enhancement for input audio.
  • [Feature] Align ground-truth transcription with audio.
  • [Quality] Use multilingual language model to assess translation quality.
  • [Efficiency] Add Azure OpenAI Service support.
  • [Quality] Use claude for translation.
  • [Feature] Add local LLM support.
  • [Feature] Multiple translate engine (Microsoft, DeepL, Google, etc.) support.
  • [Feature] Build a electron + fastapi GUI for cross-platform application.
  • Add fine-tuned whisper-large-v2 models for common languages.
  • [Others] Add transcribed examples.
    • Song
    • Podcast
    • Audiobook

Credits

Star History

Star History Chart

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openlrc-0.2.3.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openlrc-0.2.3-py3-none-any.whl (29.3 kB view details)

Uploaded Python 3

File details

Details for the file openlrc-0.2.3.tar.gz.

File metadata

  • Download URL: openlrc-0.2.3.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.11 Windows/10

File hashes

Hashes for openlrc-0.2.3.tar.gz
Algorithm Hash digest
SHA256 ff8ff37c1930679ef85edbd786fe08e5f99b2212f12470ab4db5576cd823b41f
MD5 aa73cb6e82bd30d4de35fd81c5435880
BLAKE2b-256 18655e6afebd486c31415234cdd189198a7b1ca70a23e6aed30b7dc3327d0b70

See more details on using hashes here.

File details

Details for the file openlrc-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: openlrc-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 29.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.10.11 Windows/10

File hashes

Hashes for openlrc-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7c8fd0990608e865d73b1b4dfe54a7a288c8b4815d74c195deb007b508ad6383
MD5 b7d186c19ff49c312d605c70c5cbbd0d
BLAKE2b-256 6f621c56bb6ff54c602999fc3f9b067377da0a0c50af4205b7aebba693283fa7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page