Replaces and translates voices in youtube videos

These details have not been verified by PyPI

Project links

Homepage

Project description

TurnVoice

A command-line tool to transform voices in YouTube videos with additional translation capabilities.[^1]

New Features

Elevenlabs, OpenAI TTS, Azure, Coqui TTS and System voices for redubbing
replace specific speaker or multiple speaker voices (work in progress)
process local files
preserve original background audio

more infos 👉 release notes

Prerequisites

Rubberband command-line utility installed [^2]
Deezer's Spleeter command-line utility installed [^3]
Huggingface conditions accepted for Speaker Diarization and Segmentation
Huggingface access token in env variable HF_ACCESS_TOKEN [^4]

[!TIP]

For Deezer's Spleeter CLI install Python 3.8, then run pipx install spleeter --python /path/to/python3.8 (pip install pipx)

Set your HF token with `setx HF_ACCESS_TOKEN "your_token_here"

Installation

pip install turnvoice

[!TIP] For faster rendering with GPU prepare your CUDA environment after installation:

For CUDA 11.8
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu118 --index-url https://download.pytorch.org/whl/cu118

For CUDA 12.1
pip install torch==2.1.1+cu118 torchaudio==2.1.1+cu211 --index-url https://download.pytorch.org/whl/cu211

Usage

turnvoice [-i <YouTube URL|ID|Local Video Path>] [-l <Translation Language>] -v <Voice File(s)> -o <Output File>

Example Command:

Arthur Morgan narrating a cooking tutorial:

turnvoice -i AmC9SmCBUj4 -v arthur.wav -o cooking_with_arthur.mp4

[!NOTE] Requires a voice file (e.g., arthur.wav or .json) in the same directory (you find one in the tests directory).

Parameters Explained:

-i, --in: Input video. Accepts a YouTube video URL or ID, or a path to a local video file.
-l, --language: Language for translation. Coqui synthesis supports: en, es, fr, de, it, pt, pl, tr, ru, nl, cs, ar, zh, ja, hu, ko. Omit to retain the original video language.
-il, --input_language: Language code for transcription, set if automatic detection fails.
-v, --voice: Voices for synthesis. Accepts multiple values to replace more than one speaker.
-o, --output_video: Filename for the final output video (default: 'final_cut.mp4').
-a, --analysis: Print transcription and speaker analysis without synthesizing or rendering the video.
-from: Time to start processing the video from.
-to: Time to stop processing the video at.
-e, --engine: Synthesis engine (options: coqui, elevenlabs, azure, openai, system; default: coqui).
-s, --speaker: Speaker number to be transformed.
-snum, --num_speakers: Exact number of speakers in the video, aids in diarization.
-smin, --min_speakers: Minimum number of speakers in the video.
-smax, --max_speakers: Maximum number of speakers in the video.
-dd, --download_directory: Directory for saving downloaded files (default: 'downloads').
-sd, --synthesis_directory: Directory for saving synthesized audio files (default: 'synthesis').
-exoff, --extractoff: Disables extraction of audio from the video file. Downloads audio and video from the internet.
-c, --clean_audio: Removes original audio from the final video, resulting in clean synthesis.
-tf, --timefile: Define timestamp file(s) for processing (functions like multiple --from/--to commands).

Note: -i and -l can be used as both positional and optional arguments.

Coqui Engine

Coqui engine is the default engine if no other engine is specified with the -e parameter.

Voices (-v parameter)

Submit path to one or more audiofiles containing 16 bit 24kHz mono source material as reference wavs.

Example:

turnvoice https://www.youtube.com/watch?v=cOg4J1PxU0c -e coqui -v female.wav

The Art of Choosing a Reference Wav

A 24000, 44100 or 22050 Hz 16-bit mono wav file of 10-30 seconds is your golden ticket.
24k mono 16 is my default, but I also had voices where I found 44100 32-bit to yield best results
I test voices with this tool before rendering
Audacity is your friend for adjusting sample rates. Experiment with frame rates for best results!

Fixed TTS Model Download Folder

Keep your models organized! Set COQUI_MODEL_PATH to your preferred folder.

Windows example:

setx COQUI_MODEL_PATH "C:\Downloads\CoquiModels"

Elevenlabs Engine

[!NOTE] To use Elevenlabs voices you need the API Key stored in env variable ELEVENLABS_API_KEY

All voices are synthesized with the multilingual-v1 model.

[!CAUTION] Elevenlabs is a pricy API. Focus on short videos. Don't let a work-in-progress script like this run unattended on a pay-per-use API. Bugs could be very annoying when occurring at the end of a pricy long rendering process.

Voices (-v parameter)

Submit name(s) of either a generated or predefined voice.

Example:

turnvoice https://www.youtube.com/watch?v=cOg4J1PxU0c -e elevenlabs -v Giovanni

[!TIP] Test rendering with a free engine like coqui first before using pricy ones.

OpenAI Engine

[!NOTE] To use OpenAI TTS voices you need the API Key stored in env variable OPENAI_API_KEY

Voice (-v parameter)

Submit name of voice. Currently only one voice for OpenAI supported. Alloy, echo, fable, onyx, nova or shimmer.

Example:

turnvoice https://www.youtube.com/watch?v=cOg4J1PxU0c -e openai -v shimmer

Azure Engine

[!NOTE] To use Azure voices you need the API Key for SpeechService resource in AZURE_SPEECH_KEY and the region identifier in AZURE_SPEECH_REGION

Voices (-v parameter)

Submit name(s) of either a generated or predefined voice.

Example:

turnvoice https://www.youtube.com/watch?v=BqnAeUoqFAM -e azure -v ChristopherNeural

System Engine

Voices (-v parameter)

Submit name(s) of voices as string.

Example:

turnvoice https://www.youtube.com/watch?v=BqnAeUoqFAM -e system -v David

What to expect

might not always achieve perfect lip synchronization, especially when translating to a different language
speaker detection does not work that well, probably doing something wrong or or perhaps the tech is not yet ready to be reliable
translation feature is currently in experimental prototype state (powered by Meta's nllb-200-distilled-600m) and still produces very imperfect results
occasionally, the synthesis might introduce unexpected noises or distortions in the audio (we got way better reducing artifacts with the new v0.0.30 algo)
spleeter might get confused when a spoken voice and backmusic with singing are present together in the source audio

Source Quality

delivers best results with YouTube videos featuring clear spoken content (podcasts, educational videos)
requires a high-quality, clean source WAV file for effective voice cloning

Pro Tips

How to exchange a single speaker

First perform a speaker analysis with -a parameter:

turnvoice https://www.youtube.com/watch?v=2N3PsXPdkmM -a

Then select a speaker from the list with -s parameter

turnvoice https://www.youtube.com/watch?v=2N3PsXPdkmM -s 2

Future Improvements

Tranlation quality: Add option to translate with OpenAI, DeepL API, other models. Better logic than simply transcribe the frags.
Voice Cloning from YouTube: Cloning voices directly from other videos.
Speed up to realtiem: Feed streams and get a "realtime" (translated) stream with voice of choice
Open up the CLI: Allow local Videos, Audios and even Textfiles as Input until down to turnvoice "Hello World"
match spoken volume of original voice

License

TurnVoice is proudly under the Coqui Public Model License 1.0.0 and NLLB-200 CC-BY-NC License (these are OpenSource NonCommercial licenses).

Let's Make It Fun! 🎉

Share your funniest or most creative TurnVoice creations with me!

And if you've got a cool feature idea or just want to say hi, drop me a line on

If you like the repo please leave a star ✨ 🌟 ✨

[^1]: State is work-in-progress (early pre-alpha), so please expect API changes to come and sometimes things not working properly yet. Developed on Python 3.11.4 under Win 10.
[^2]: Rubberband is needed to pitchpreserve timestretch audios for fitting synthesis into timewindow [^3]: Deezer's Spleeter is needed to split vocals for original audio preservation [^4]: Huggingface access token is needed to download the speaker diarization model for identifying speakers with pyannote.audio

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.65

Dec 20, 2023

0.0.60

Dec 18, 2023

0.0.50

Dec 15, 2023

0.0.46

Dec 13, 2023

0.0.45

Dec 12, 2023

0.0.41

Dec 12, 2023

This version

0.0.40

Dec 12, 2023

0.0.33

Dec 8, 2023

0.0.32

Dec 8, 2023

0.0.31

Dec 8, 2023

0.0.30

Dec 8, 2023

0.0.22

Dec 5, 2023

0.0.21

Dec 5, 2023

0.0.20

Dec 5, 2023

0.0.13

Dec 5, 2023

0.0.12

Dec 5, 2023

0.0.11

Dec 5, 2023

0.0.7

Oct 10, 2024

0.0.2 yanked

Dec 5, 2023

Reason this release was yanked:

wrong version number

0.0.1

Dec 4, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

TurnVoice-0.0.40.tar.gz (3.2 MB view details)

Uploaded Dec 12, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

TurnVoice-0.0.40-py3-none-any.whl (3.3 MB view details)

Uploaded Dec 12, 2023 Python 3

File details

Details for the file TurnVoice-0.0.40.tar.gz.

File metadata

Download URL: TurnVoice-0.0.40.tar.gz
Upload date: Dec 12, 2023
Size: 3.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for TurnVoice-0.0.40.tar.gz
Algorithm	Hash digest
SHA256	`66454018444ae8bce79b550ea0e147d6b0002461dfc8a9709c7ada248fdba956`
MD5	`21c7ca424c700629d805b065726cde09`
BLAKE2b-256	`64cd27e40b746e472385cd883b3ba99ec03adaa0c43896be641627024e91fc5a`

See more details on using hashes here.

File details

Details for the file TurnVoice-0.0.40-py3-none-any.whl.

File metadata

Download URL: TurnVoice-0.0.40-py3-none-any.whl
Upload date: Dec 12, 2023
Size: 3.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for TurnVoice-0.0.40-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c25913f7af7409677aac16eed860005516b2f078cfb3d60dbfde172468742021`
MD5	`b6d6ffa24cc3bc40611db670683f512a`
BLAKE2b-256	`f333e1ff21aa0d4c3c4656cba9cde46d832012599333073ba80a05ac79f7fc2b`

See more details on using hashes here.

TurnVoice 0.0.40

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TurnVoice

New Features

Prerequisites

Installation

Usage

Example Command:

Parameters Explained:

Coqui Engine

Voices (-v parameter)

The Art of Choosing a Reference Wav

Fixed TTS Model Download Folder

Elevenlabs Engine

Voices (-v parameter)

OpenAI Engine

Voice (-v parameter)

Azure Engine

Voices (-v parameter)

System Engine

Voices (-v parameter)

What to expect

Source Quality

Pro Tips

How to exchange a single speaker

Future Improvements

License

Let's Make It Fun! 🎉

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes