Google EMEA gTech Ads Data Science Team's solution to automatically translate and dub video ads into multiple languages using AI.

These details have not been verified by PyPI

Project links

Homepage

Project description

gTech Ads Ariel for AI Video Ad Dubbing

Ariel is an open-source Python library that facilitates efficient and cost-effective dubbing of video ads into multiple languages.

This is not an official Google product.

Overview • Features • Benefits • Building Blocks • Requirements • Language Compatibility • Getting Started • References

Overview

Ariel is a cutting-edge solution designed to enhance the global reach of digital advertising. It enables advertisers to automate the translation and dubbing of their video ads into a wide range of languages.

Features

Automated Dubbing: Streamline the generation of high-quality dubbed versions of video ads in various target languages.
Scalability: Handle large volumes of videos and diverse languages efficiently.
User-Friendly: Offers a straightforward API and/or user interface for simplified operation.
Cost-Effective: Significantly reduce dubbing costs compared to traditional methods. The primary expenses are limited to Gemini API and Text-To-Speech API calls.

Benefits

Enhanced Ad Performance: Improve viewer engagement and potentially increase conversion rates with localized ads.
Streamlined Production: Minimize the time and cost associated with manual translation and voiceover work.
Rapid Turnaround: Quickly generate dubbed versions of ads to accelerate multilingual campaign deployment.
Expanded Global Reach: Reach broader audiences worldwide with localized advertising content.

Building Blocks

Ariel leverages a powerful combination of state-of-the-art AI and audio processing techniques to deliver accurate and efficient dubbing results:

Video Processing: Extracts the audio track from the input video file.
Audio Processing:
- DEMUCS: Employed for advanced audio source separation.
- pyannote: Performs speaker diarization to identify and separate individual speakers.
Speech-To-Text (STT):
- faster-whisper: A high-performance speech-to-text model.
- Gemini 1.5 Flash: A powerful multimodal language model that contributes to enhanced transcription.
Translation:
- Gemini 1.5 Flash: Leverages its language understanding for accurate and contextually relevant translation.
Text-to-Speech (TTS):
- GCP's Text-To-Speech: Generates natural-sounding speech in the target language.
- [OPTIONAL] ElevenLabs: An alternative API to generate speech. It's recommened for the best results. WARNING: ElevenLabs is a paid solution and will generate extra costs. See the pricing here.

Requirements

System Requirements:
- FFmpeg: For video and audio processing. If not installed, you can use the following commands:
```
sudo apt update
sudo apt install ffmpeg
```
- GPU (Recommended): For optimal performance, especially with larger videos.
Accounts and Tokens:
- Google Cloud Platform (GCP) Project: Set up a GCP project. See here for instructions.
- Enabled Text-To-Speech API: Enable the Text-To-Speech API in your GCP project. See here for instructions.
- Hugging Face Token: To access the PyAnnote speaker diarization model. See here on how to get the token.
- Google AI Studio Token: To access the Gemini language model. See here on how to get the token.
- [OPTIONAL] ElevenLabs API: To access the ElevenLabs API. See here.
User Agreements:
- Hugging Face Model License: You must accept the user conditions for the PyAnnote speaker diarization here and segmentation models here.

Language Compatibility

You can dub video ads from and to the following languages:

Arabic (ar-SA), (ar-EG)
Bengali (bn-BD), (bn-IN)
Bulgarian (bg-BG)
Chinese (Simplified) (zh-CN)
Chinese (Traditional) (zh-TW)
Croatian (hr-HR)
Czech (cs-CZ)
Danish (da-DK)
Dutch (nl-NL)
English (en-US), (en-GB), (en-CA), (en-AU)
Estonian (et-EE)
Finnish (fi-FI)
French (fr-FR), (fr-CA)
German (de-DE)
Greek (el-GR)
Gujarati (gu-IN)
Hebrew (he-IL) (Note: Not supported with ElevenLabs API)
Hindi (hi-IN)
Hungarian (hu-HU)
Indonesian (id-ID)
Italian (it-IT)
Japanese (ja-JP)
Kannada (kn-IN)
Korean (ko-KR)
Latvian (lv-LV)
Lithuanian (lt-LT)
Malayalam (ml-IN)
Marathi (mr-IN)
Norwegian (nb-NO), (nn-NO)
Polish (pl-PL)
Portuguese (pt-PT), (pt-BR)
Romanian (ro-RO)
Russian (ru-RU)
Serbian (sr-RS)
Slovak (sk-SK)
Slovenian (sl-SI)
Spanish (es-ES), (es-MX)
Swahili (sw-KE)
Swedish (sv-SE)
Tamil (ta-IN), (ta-LK)
Telugu (te-IN)
Thai (th-TH)
Turkish (tr-TR)
Ukrainian (uk-UA)
Vietnamese (vi-VN)

The language coverage depends on the underlying services. Check the below for any changes:

Speech-to-Text (Whisper)

Ariel leverages the open-source Whisper model, which supports a wide array of languages for speech-to-text conversion. The supported languages can be found here.

Translation (Gemini)

Gemini, the language model used for translation, is proficient in multiple languages. For the most current list of supported languages, refer to here.

Text-to-Speech (GCP Text-to-Speech or ElevenLabs)

GCP Text-to-Speech offers an extensive selection of voices in various languages. For a comprehensive list of supported languages and available voices, refer to here. ElevenLabs API is an alterantive to GCP Text-to-Speech. See a list of supported languages here.

Getting Started

Installation:
```
pip install gtech-ariel
```

Usage:

python main.py --input_file=<path_to_video> --output_directory=<output_dir> --advertiser_name=<name> --original_language=<lang_code> --target_language=<lang_code> [--number_of_speakers=<num>] [--diarization_instructions=<instructions>] [--translation_instructions=<instructions>] [--merge_utterances=<True/False>] [--minimum_merge_threshold=<seconds>] [--preferred_voices=<voice1>,<voice2>] [--clean_up=<True/False>] [--pyannote_model=<model_name>] [--diarization_system_instructions=<instructions>] [--translation_system_instructions=<instructions>] [--hugging_face_token=<token>] [--gemini_token=<token>] [--model_name=<model_name>] [--temperature=<value>] [--top_p=<value>] [--top_k=<value>] [--max_output_tokens=<value>] [--elevenlabs_token=<token>] [--use_elevenlabs=<value>]

Configuration: (Optional)
- Customize settings for speaker diarization, translation, voice selection, and more using the command-line flags.

References

DEMUCS: https://github.com/facebookresearch/demucs
pyannote: https://github.com/pyannote/pyannote-audio
faster-whisper: https://github.com/SYSTRAN/faster-whisper
ElevenLabs: https://elevenlabs.io/docs/introduction

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.19

Nov 4, 2024

0.0.18

Oct 28, 2024

0.0.17

Oct 25, 2024

0.0.16

Oct 24, 2024

0.0.15

Oct 23, 2024

0.0.14

Oct 17, 2024

0.0.13

Oct 17, 2024

0.0.12

Oct 14, 2024

This version

0.0.11

Sep 19, 2024

0.0.10

Aug 27, 2024

0.0.9

Aug 12, 2024

0.0.8

Aug 9, 2024

0.0.7

Aug 8, 2024

0.0.6

Aug 7, 2024

0.0.5

Jul 29, 2024

0.0.4

Jul 29, 2024

0.0.3

Jul 25, 2024

0.0.2

Jul 23, 2024

0.0.1

Jul 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtech-ariel-0.0.11.tar.gz (46.2 kB view hashes)

Uploaded Sep 19, 2024 Source

Built Distribution

gtech_ariel-0.0.11-py3-none-any.whl (49.7 kB view hashes)

Uploaded Sep 19, 2024 Python 3

Hashes for gtech-ariel-0.0.11.tar.gz

Hashes for gtech-ariel-0.0.11.tar.gz
Algorithm	Hash digest
SHA256	`607a49f14490650407b1807a5016a806264b3bc43b9723a10e7b0526c7f99161`
MD5	`b51a17bf31ed2e100e3f0a6cc6d520c1`
BLAKE2b-256	`440c8e386558f1af26ebbb15c159178f0d3c44cfc78fdb82dc0799e2ce5fdfca`

Hashes for gtech_ariel-0.0.11-py3-none-any.whl

Hashes for gtech_ariel-0.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1703eee2b5b304d7dc431ea320300b52ca8c04db86fd5e39004a75fcd8beba2`
MD5	`4fb7af9d064f45207a29abfc4924fcdf`
BLAKE2b-256	`f03991836cf364e15cf5096b9cb7da57d70c45b63b15e2098543485ed837da4d`