Skip to main content

Replaces and translates voices in youtube videos

Project description

TurnVoice

A command-line tool (currently in beta) to transform voices in YouTube videos with additional translation capabilities.

What to expect

TurnVoice replaces the entire audio track of the video with a newly generated voice. As a result, all original sounds, including music and ambient noises, are replaced by silence, leaving only the new voice track. This approach ensures clarity in the newly generated voice but means that other audio elements from the original video will be absent in the output.

  • might not always achieve perfect lip synchronization, especially when translating to a different language
  • translation feature is currently in very alpha state (powered by Meta's nllb-200-distilled-600m) and still produces very imperfect results
  • occasionally, the synthesis might introduce unexpected noises or distortions in the audio (we got WAY better reducing artifacts with the new v0.0.30 algo)

Source Quality

  • delivers best results with YouTube videos featuring clear spoken content (podcasts, educational videos)
  • requires a high-quality, clean source WAV file for effective voice cloning

Prerequisites

You need to have Rubberband command-line utility installed (needed for pitchpreserve timestretching audios)

Installation

pip install turnvoice

For faster rendering prepare your CUDA environment:

CUDA 11.8:

pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118

CUDA v12.1:

pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu211 --index-url https://download.pytorch.org/whl/cu211

Installation in a virtual environment is recommended.

Usage

turnvoice [-u] <YouTube Video URL|ID> [-l] <Translation Language> -r <Reference WAV File> -o <Output Video Filename>

For example, this is musk's voice replaced with the male default voice:

turnvoice RK91Ji6GCZ8

Same translated to spanish:

turnvoice RK91Ji6GCZ8 es

Parameters Explained:

  • -u, --url: (required) The YouTube video ID or URL you want to transform
  • -l, --language: Language to translate to (supported: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, ja, hu, ko) leaving this out keeps the source video language
  • -d, --download_directory: Where to save the video downloads (default: 'downloads')
  • -s, --synthesis_directory: Where to save the text to speech audio files (default: 'synthesis')
  • -e, --extractoff: Use with -e to disable extract audio directly from the video (may lead to higher quality while also increasing likelihood of errors)
  • -r, --reference_wav: Your chosen voice in wav format (24kHz, 16 bit, mono, ~10-30s)
  • -o, --output_video: The grand finale video file name (default: 'final_cut.mp4')

You can leave out -u and -l as first parameters.

Example Command:

Ever wanted Arthur Morgan to narrate a cooking tutorial? Here's how:

turnvoice AmC9SmCBUj4 -r arthur.wav -o cooking_with_arthur.mp4

This example needs a arthur.wav (or.json) file in the same directory. Works when executed from the tests directory.

Pro Tips

The Art of Choosing a Reference Wav

  • A 24000, 44100 or 22050 Hz 16-bit mono wav file of 10-30 seconds is your golden ticket.
  • 24k mono 16 is my default, but I also had voices where I found 44100 32-bit to yield best results
  • I test voices with this tool before rendering
  • Audacity is your friend for adjusting sample rates. Experiment with frame rates for best results!

Fixed TTS Model Download Folder

Keep your models organized! Set COQUI_MODEL_PATH to your preferred folder.

Windows example:

setx COQUI_MODEL_PATH "C:\Downloads\CoquiModels"

Future Improvements

  • Diarization: Replace voices of multiple speakers.
  • TTS Voice variety: Add OpenAI TTS, Azure and Elevenlabs as voice sources.
  • Tranlation quality: Add option to translate with OpenAI, DeepL API, other models.
  • Voice Cloning from YouTube: Cloning voices directly from other videos!
  • Speed that thing up: Imagine feeding a youtube stream and getting a "realtime" translation stream (with strong GPU we can aim at ~10-20s latency)
  • Open up the CLI: Allow local Videos, Audios and even Textfiles as Input: turnvoice "Hello World"

License

TurnVoice is proudly under the Coqui Public Model License 1.0.0 and NLLB-200 CC-BY-NC License (these are OpenSource NonCommercial licenses).

Let's Make It Fun! 🎉

Share your funniest or most creative TurnVoice creations with me!

And if you've got a cool feature idea or just want to say hi, drop me a line on

🌟 if you like this library please star the repo 🌟

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TurnVoice-0.0.33.tar.gz (3.2 MB view hashes)

Uploaded Source

Built Distribution

TurnVoice-0.0.33-py3-none-any.whl (3.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page