Skip to main content

Subtitle to Audio, generate audio or speech from any subtitle file

Project description

Subtitle to Audio

Subtitle to audio, generate audio/speech from any subtitle file using Coqui-ai TTS and synchronize the audio timing according to subtitle time.

Demo : Open In Colab

ko-fi

Dependencies

ffmpeg, pydub, librosa, coqui-ai TTS, ffmpeg-python

Installation

pip install TTS
pip install git+https://github.com/bnsantoso/sub-to-audio
pip install TTS
pip install subtoaudio

ffmpeg on linux

apt-get install ffmpeg

Example usage

Basic use is very similiar to Coqui-ai TTS, you can check their documentation and the <lang-iso_code>.

!Note: Use non-overlapping subtitles with an optimal Character per Second / CPS for best result

!Note: Use software like aegisub to edit your subtitle

from subtoaudio import SubToAudio

# list all model
SubToAudio().coqui_model()

# get model index
model = SubToAudio().coqui_model()[1]

# The code will output 'yoursubtitle.wav' in the current directory.
sub = SubToAudio(model_name=model)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle)

# you can choose 1100 different language using fairseq model
sub = SubToAudio(fairseq_language='<lang-iso_code>')
subtitle = sub.subtitle("yoursubtitle.ass")
sub.convert_to_audio(sub_data=subtitle) 

# specify model name
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav")

# specify model and config path
sub = SubToAudio(model_path="path/to/your/model.pth" config_path="config/path.json")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle)

# speaker=tts.speakers[0] or None if model doesnt have multiple speakers
# language=tts.languages[0] or None if doesnt have multiple languages

# list speaker
sub.speakers()
speaker1 = sub.speakers()[1]

# list languages
sub.languages()
langu = sub.languages()[0]

sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, language=langu, speaker=speaker1, output_path="subtitle.wav")

# Save temporary audio to current folder
sub = SubToAudio(model_name="tts_models/multilingual/multi-dataset/your_tts")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav", save_temp=True)

Voice Conversion

To use voice conversion method, you must pass voice_conversion:bool and speaker_wav:str paramater on self.convert_to_audio. Voice conversion cannot run if your model have multiple speakers.

from subtoaudio import SubToAudio

sub = SubToAudio(fairseq_language="eng")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, voice_conversion=True, speaker_wav="voice.wav", language="en")

Coqui Studio Api

To use Coqui Studio Api you'll need to configure the COQUI_STUDIO_TOKEN environment variable.

import os

os.environ['COQUI_STUDIO_TOKEN'] = # yourapi

After your token set you can get coqui studio model, you can follow this name convention coqui_studio/en/<studio_speaker_name>/coqui_studio

from subtoaudio import SubToAudio

sub = SubToAudio(model_name="coqui_studio/en/Torcull Diarmuid/coqui_studio", progress_bar=False)
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav", save_temp=True)

# use emotion paramater and speed paramater
sub.convert_to_audio(sub_data=subtitle, output_path="subtitle.wav", emotion="Happy", speed=1.5)

Tempo Mode

Use the tempo_mode parameter to speed up the audio. There are three tempo modes:

  • tempo_mode="all" : This accelerates all audio. Use tempo_speed=float to specify the speed.
  • tempo_mode="overflow" : This accelerates the audio to match the total subtitle duration plus the blank duration before the next subtitle appears. 'tempo_limit' will limit the speed increase during overflow.
  • tempo_mode="precise" : This accelerates the audio to match the duration the subtitle appears."
from subtoaudio import SubToAudio

# Speed up tempo or speech rate
sub = SubToAudio(model_name="tts_models/de/thorsten/tacotron2-DDC")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="all", tempo_speed=1.3)

# Change the tempo or speech rate of all audio files , default is 1.2
sub = SubToAudio("tts_models/multilingual/multi-dataset/xtts_v1")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="all", tempo_speed=1.3)

# Change tempo or speech rate to audio that doesn't match the subtitle duration
sub = SubToAudio(fairseq_language="ind")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow")

# Limit tempo speed on the overflow mode 
sub = SubToAudio(fairseq_language="ind")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow", tempo_limit=1.2)

# Match audio length to subtitle duration
sub = SubToAudio(fairseq_language="ind")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="precise")

Shift Mode

shift_mode parameter will shift audio that doesnt match subtitle duration.

  • shift_mode="right" : Shift audio time to the right and prevent audio overlaping.
  • shift_mode="left" : Shift audio to the left and prevent audio overlap, but be cautious of limited space on the left side, as some audio may disappear.
  • shift_mode="interpose" : Shift audio to mid position and prevent right and left of audio overlaping. (Note: This mode can be clunky, so use it cautiously.)
  • shift_mode="left-overlap" : Shift audio time to the left, allowing overlap.
  • shift_mode="interpose-overlap" : Shift audio to mid position, allowing overlap.
  • shift_limit=int or "str" : limit audio shift, use integer for millisecond or string like 2.5s for second
from subtoaudio import SubToAudio

# shift mode with limit of 2 second to the right.

sub = SubToAudio(fairseq_language="vie")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, tempo_mode="overflow", shift_mode="right", limit_shift="2s")

# shift audio to left position or, time before next subtitle appear

sub = SubToAudio(fairseq_language="fra")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, shift_mode="left-overlap")

# shift to left, and limit shift only 1 sec.
sub = SubToAudio(fairseq_language="ind")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=sub, shift_mode="left", shift_limit=1000) # 1000 = 1s

Bark and Tortoise example

from subtoaudio import SubToAudio

#  Random Speaker will give you weird result when using bark model with SubToAudio

# Bark random
sub = SubToAudio("tts_models/multilingual/multi-dataset/bark")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow")

# Tortoise random
sub = SubToAudio("tts_models/en/multi-dataset/tortoise-v2")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, shift_mode="overflow", preset="fast")

#  To use voice clone you need voice_dir and speaker paramater
#  Voice Clone expecting .wav or .npz file inside folder speaker_1
#  voice/speaker_1/hana.wav or voice/speaker_1/hana.npz
#  if your speaker folder only have .wav file, it will generate .npz file after you runing it.

sub = SubToAudio("tts_models/multilingual/multi-dataset/bark")
subtitle = sub.subtitle("yoursubtitle.srt")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow", voice_dir="voice/",speaker="speaker_1")

# same with bark, the folder structure like this 'voice/speaker2/ron.wav'
sub = SubToAudio("tts_models/en/multi-dataset/tortoise-v2")
subtitle = sub.subtitle("yoursubtitle.ass")
sub.convert_to_audio(sub_data=subtitle, tempo_mode="overflow", voice_dir="voice/", speaker="speaker2")

Citation

Eren, G., & The Coqui TTS Team. (2021). Coqui TTS (Version 1.4) [Computer software]. https://doi.org/10.5281/zenodo.6334862

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subtoaudio-0.1.5.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

subtoaudio-0.1.5-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file subtoaudio-0.1.5.tar.gz.

File metadata

  • Download URL: subtoaudio-0.1.5.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for subtoaudio-0.1.5.tar.gz
Algorithm Hash digest
SHA256 54f4cc288d340ee97897a0d666bc6218ed6faf9f4e5d4160d641105bca12ffd9
MD5 12db6117d5e145d50dea49f855ff638c
BLAKE2b-256 89f39322dbf388d8d3ad58950870917162fd6436ea4e6fd6a5a7f016786200f9

See more details on using hashes here.

File details

Details for the file subtoaudio-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: subtoaudio-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.10

File hashes

Hashes for subtoaudio-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 77002be531d5cc8a60bd016547314f24df2a96a52c34c0bd4f03355a85525e9b
MD5 7b125b104cfbb42ea68027457f24d1e0
BLAKE2b-256 44e9ba350e7c7d99f79d276ec63e548ff7205c4afea08613b59ff3d3f29cf281

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page