Skip to main content

Google EMEA gTech Ads Data Science Team's solution to automatically translate and dub video ads into multiple languages using AI.

Project description

gTech Ads Ariel for AI Video Ad Dubbing

Ariel is an open-source Python library that facilitates efficient and cost-effective dubbing of video ads into multiple languages.

python PyPI GitHub last commit Code Style: Google Open in Colab

Ariel Logo
This is not an official Google product.

OverviewFeaturesBenefitsLanguage CompatibilityBefore You BeginGetting StartedBuilding BlocksReferences

Overview

Ariel is a cutting-edge solution designed to enhance the global reach of digital advertising. It enables advertisers to automate the translation and dubbing of their video ads into a wide range of languages.

Features

  • Automated Dubbing: Streamline the generation of high-quality dubbed versions of video ads in various target languages.
  • Scalability: Handle large volumes of videos and diverse languages efficiently.
  • User-Friendly: Offers a straightforward API and/or user interface for simplified operation.
  • Cost-Effective: Significantly reduce dubbing costs compared to traditional methods. The primary expenses are limited to Gemini API and Text-To-Speech API calls.

Benefits

  • Enhanced Ad Performance: Improve viewer engagement and potentially increase conversion rates with localized ads.
  • Streamlined Production: Minimize the time and cost associated with manual translation and voiceover work.
  • Rapid Turnaround: Quickly generate dubbed versions of ads to accelerate multilingual campaign deployment.
  • Expanded Global Reach: Reach broader audiences worldwide with localized advertising content.

Language Compatibility

You can dub video ads from and to many languages.

Expand to see the full list of supported languages.
  • Arabic (ar-SA), (ar-EG)
  • Bengali (bn-BD), (bn-IN)
  • Bulgarian (bg-BG)
  • Chinese (Simplified) (zh-CN)
  • Chinese (Traditional) (zh-TW)
  • Croatian (hr-HR)
  • Czech (cs-CZ)
  • Danish (da-DK)
  • Dutch (nl-NL)
  • English (en-US), (en-GB), (en-CA), (en-AU)
  • Estonian (et-EE)
  • Finnish (fi-FI)
  • French (fr-FR), (fr-CA)
  • German (de-DE)
  • Greek (el-GR)
  • Gujarati (gu-IN)
  • Hebrew (he-IL) (Note: Not supported with ElevenLabs API)
  • Hindi (hi-IN)
  • Hungarian (hu-HU)
  • Indonesian (id-ID)
  • Italian (it-IT)
  • Japanese (ja-JP)
  • Kannada (kn-IN)
  • Korean (ko-KR)
  • Latvian (lv-LV)
  • Lithuanian (lt-LT)
  • Malayalam (ml-IN)
  • Marathi (mr-IN)
  • Norwegian (nb-NO), (nn-NO)
  • Polish (pl-PL)
  • Portuguese (pt-PT), (pt-BR)
  • Romanian (ro-RO)
  • Russian (ru-RU)
  • Serbian (sr-RS)
  • Slovak (sk-SK)
  • Slovenian (sl-SI)
  • Spanish (es-ES), (es-MX)
  • Swahili (sw-KE)
  • Swedish (sv-SE)
  • Tamil (ta-IN), (ta-LK)
  • Telugu (te-IN)
  • Thai (th-TH)
  • Turkish (tr-TR)
  • Ukrainian (uk-UA)
  • Vietnamese (vi-VN)

Before You Begin

  • System Requirements:
    • Installed FFmpeg: For video and audio processing. You don't need to install this if you're running from Google Colab.
    • GPU (Recommended): For optimal performance, especially with larger videos. It's available for free in Google Colab.
  • Accounts and Tokens:
    • Google Cloud Platform (GCP) Project: Set up a GCP project. See here for instructions.
      • Enabled Vertex AI API: Enable the Vertex API in your GCP project. See here for instructions.
      • Enabled Cloud Storage API: Enable the Cloud Storage API in your GCP project. See here for instructions.
        • You need access to create and delete Google Cloud Storage (GCS) buckets.
      • Enabled Text-To-Speech API: Enable the Text-To-Speech API in your GCP project if you choose it for the Text-To-Speech part of the process. See here for instructions.
      • Google Drive API: Enable the Google Drive API if you use the demo notebook called 'dubbing_workflow.ipynb in Google Colab.
      • Google Sheets API: Enable the Google Sheets API if you use the demo notebook called 'dubbing_workflow.ipynb in Google Colab and want to pass script voice metadata from Google Sheets.
    • Hugging Face Token: To access the PyAnnote speaker diarization model. See here on how to get the token.
      • Hugging Face Model License: You must accept the user conditions for the PyAnnote speaker diarization here and segmentation models here.
    • [OPTIONAL] ElevenLabs API: To access the ElevenLabs API. See here.
      • Commercial Use: You are responsible for selecting the right ElevenLabs license if you decide to use the outputs from Ariel in a commercial setting. See the pricing here. Else, ElevenLabs is free to use.
  • Data handling:
    • Input files: Ariel can work with both video and audio ads and they need to be in the MP4 or MP3 file formats respectively. See here for the limitations on the duration of the ads enforeced by Gemini models.
    • Storage: Ariel reads, processes and saves all the files in your environment, e.g. Colab. Only on one instance it creates a temporary Google Cloud Storage (GCS) bucket to upload the input file there for the Gemini model to indentify unique speakers. The bucket with all its contents is removed immediately afterwards. You can modify the gcp_region argument to choose the best location to perform this operation.
    • Voice cloning: You can clone voices from the input file to make the dubbing sound close to the original if you decide to use ElevenLabs. It's is your responsibility to ensure that you are legally allowed to do so.

Getting started

To start using Ariel, just click on the this button: Open in Colab

Building Blocks

Ariel leverages a powerful combination of state-of-the-art AI and audio processing techniques to deliver accurate and efficient dubbing results:

  1. Video Processing: Extracts the audio track from the input video file.
  2. Audio Processing:
    • DEMUCS: Employed for advanced audio source separation.
    • pyannote: Performs speaker diarization to identify and separate individual speakers.
  3. Speech-To-Text (STT):
    • faster-whisper: A high-performance speech-to-text model.
    • Gemini 1.5 Flash: A powerful multimodal language model that contributes to enhanced transcription.
  4. Translation:
    • Gemini 1.5 Flash: Leverages its language understanding for accurate and contextually relevant translation.
  5. Text-to-Speech (TTS):
    • GCP's Text-To-Speech: Generates natural-sounding speech in the target language.
    • [OPTIONAL] ElevenLabs: An alternative API to generate speech. It's recommened for the best results. WARNING: ElevenLabs is a paid solution and will generate extra costs. See the pricing here.

🆕 Ariel User Interface

For a more user-friendly experience, we're also providing an Ariel version that can be deployed onto Google Cloud Platform with a GUI.

Ariel UI screenshot

Requirements

In order to use Ariel UI, you need the following:

  1. Google Cloud Platform Project to host the backend. The following components are created during deployment:
    • Cloud Storage Bucket to store input & output videos, video dubbing artifacts and all other metadata files. This bucket is also used as an interaction point with GUI (files created/removed there are triggering the backend processing).
    • Cloud Run instance that processes all the steps. This part is implemented as a Python Docker container (which is also built during installation).
    • Pub/Sub Infrastructure based on EventArc, it notifies the container of new files.
  2. Cloud Run with GPU Support is needed to ensure Ariel backend runs swiftly. You can apply for a Quota increase in a supported region using the link here. This could take up to two days, however it is generally much quicker than that.
  3. AppsScript Project to host the frontend, an Angular web-app. For this, you need Google Workspace access.

Deployment

Please make sure you have fulfilled all prerequisites mentioned under Requirements first.

  1. Make sure your system has an up-to-date installation of Node.js and npm.
  2. Install clasp by running npm install @google/clasp -g, then login via clasp login.
  3. Navigate to the Apps Script Settings page and enable the Apps Script API.
  4. Make sure your system has an up-to-date installation of the gcloud CLI, then login via gcloud auth login.
  5. Make sure your system has an up-to-date installation of git and use it to clone this repository: git clone https://github.com/google-marketing-solutions/ariel.
  6. Navigate to the directory where the source code lives: cd ariel.
  7. Run npm start. This will prompt you for configuration values and suggest reasonable defaults.

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtech_ariel-0.0.30.tar.gz (73.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gtech_ariel-0.0.30-py3-none-any.whl (72.6 kB view details)

Uploaded Python 3

File details

Details for the file gtech_ariel-0.0.30.tar.gz.

File metadata

  • Download URL: gtech_ariel-0.0.30.tar.gz
  • Upload date:
  • Size: 73.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for gtech_ariel-0.0.30.tar.gz
Algorithm Hash digest
SHA256 62b4f41799a572984f092c0133886df057760f3116b6155182a6952f35c5b617
MD5 6ecd310bf0c1921ee80867c0f046ce7d
BLAKE2b-256 82e9723059405944886141747a40e53cb770975e541d6fb751414180d3e60790

See more details on using hashes here.

File details

Details for the file gtech_ariel-0.0.30-py3-none-any.whl.

File metadata

  • Download URL: gtech_ariel-0.0.30-py3-none-any.whl
  • Upload date:
  • Size: 72.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.18

File hashes

Hashes for gtech_ariel-0.0.30-py3-none-any.whl
Algorithm Hash digest
SHA256 2ca2e16837825d5db43942ff263747dcb1f9771a52a3f77559e5d37c21c21186
MD5 b4ae991eaaca448f7fec529ce6a40514
BLAKE2b-256 719c05d0ff84434d05894f040d6453f19eb76ae1e804a60129ee185b57097287

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page