Screen play / drama text to multi-voice audio play converter
Project description
About
dramaTTS parses scripts (plain text files) for theatre/screen plays and converts them into a multi-voice audio plays (wave-files).
While the script parsing functionality is provided by the dramaTTS program itself, it relies on external tools for the audio processing:
The Festival Speech Synthesis System [1] (herein referred to as Festival) is used for speech synthesis
Sound eXchange (SoX) [2] for audio post-processing.
SoX, Festival as well as voices and lexicons for Festival have to be installed in order to create audio output with dramaTTS (see Prerequisites ).
Licenses
dramaTTS is free software released under the GPLv3 license (see LICENSE [3] file), Copyright (c) 2020 Thies Hecker
The GUI of dramaTTS is made with PyQt [4] and setuptools_scm [5] is used to align version numbering with git.
While dramaTTS is a standalone application, it is of limited use without Festival and SoX being installed, which provide the audio rendering (only script parsing including syntax highlighting, etc. is available).
While the Festival application itself and SoX are released under free software licenses as well (see details below), specific components, which are commonly bundled with Festival (i.e. certain lexicons and voices) may be released under non-free licenses.
For instance the festlex-OALD lexicon, which can be found among other files (incl. the source code of the latest Festival release) on the Festvox 2.5 release page [7] lexicon is restricted to non-commercial use only.
The Installing Festival without non-free components section will provide an example for a Festival distribution based on free components only.
Please see COPYING [6] for details on licenses and copyright disclaimers of the individual components.
Features
dramaTTS provides 2 main components: a script parser and an audio-renderer.
The script parser features:
input files with minimum formatting (see Basic script format)
syntax highlighting (identifies different content like new scenes, dialogue lines, narrative descriptions,…)
text string substitutions supporting regular expressions
some utility functions like sorting speakers according to their number of text lines
The audio-renderer is basically a front-end to Festival and SoX. Each line of script text will be synthesized by Festival and saved to a wave-file, which is then post-processed by SoX, allowing:
Altering of Festival voices (pitch, tempo and volume)
support for multiple CPU cores to accelerate audio rendering (dispatches parallel processes for individual lines)
using a Festival server for rendering is supported
some post-processing: normalize all voices, combine audio files (lines -> scenes -> single project file)
(re-)rendering of individual scenes or speakers
Prerequisites
python
You will need a python3 distribution installed and for most convenience you should have either the pip or conda package manager installed.
On linux you will most likely have python and pip already installed - if not you should be able to install them with distributions package-manager.
E.g. for debian based system like ubuntu just run:
sudo apt-get python3-pip
or on arch based:
sudo pacman -S python-pip
For Windows users I would recommend to install Anaconda [9] or miniconda [10], which will provide the conda package manager (make sure to get the python3 - not the python2 - version!).
To install dramaTTS with pip:
pip install dramatts
Note, that on some distributions you may install python2 and python3 in parallel. In such cases you should make sure, that you not using a pip for your python2 environment to install dramaTTS. Eventually you need to use pip3 as a command. You can check if you are using the correct pip by calling:
pip --version
To install dramaTTS with conda:
conda install -c thecker dramatts
In both cases pip or conda should download all required dependencies and should be able to launch the program. To do that just type:
python -m dramatts.dramatts_gui
The GUI should pop up and you can import text files, define roles etc., but you will not be able render audio unless you have installed Festival (and its components) and SoX.
Installing Festival without non-free components
While many linux distributions include pre-built packages for Festival they often include non-free components like festlex-OALD. Therefore the safest way to create a free Festival distribution is to compile from source. To form a free distribution following components could be used:
Festival 2.5 (main application)
Edinburgh Speech Tools (EST) - required to compile Festival
festlex_CMU (lexicon)
festlex_POSLEX (lexicon)
festvox_cmu_us_slt_cg (female voice)
festvox_cmu_us_rms_cg (male voice)
All components can be downloaded at CMU’s (Carnegie Mellon University) Festvox 2.5 release page [7]. The source code of Festival and EST can also be cloned from the Festvox github page [8].
To compile the code follow the instructions in the INSTALL file included in Festival.
Note, that more voices can be found at the Festvox page (although some might require e.g. additional lexicons and thus won’t be working with the selected components above). Additionally voices may also be altered in tempo and pitch in dramaTTS (by post-processing with SoX) to create more than one speaker per voice.
Building Festival from source is based on the autotools-toolchain - so it shouldn’t be a problem on GNU/linux, but may be complicated on MS Windows.
Fortunately the eGuideDog team has created compile-instructions for Windows and even provides a Festival 2.5 version including precompiled binaries for Windows [11] (which does not include the problematic festlex-OALD lexicon).
In order to use Festival under Windows with dramaTTS you will need to copy the text2wave.bat (see the /utils folder [12]) to your Festival installation.
Make sure to adjust the paths in text2wave.bat, if you did not install Festival in C:\Festival.
Installing SoX
Under linux you will most likely have a pre-build package for SoX. Building from source is probably not required.
Binaries for Windows can be found on the SoX sourceforge page [13].
Specifying location of external tools
dramaTTS will try to determine the install locations of Festival and SoX automatically. This should most likely work under linux, if you installed the tools from the official packages (or put the location of the binaries in your PATH).
Under windows you will most likely have to define the tool locations manually.
To do that, just go to the preferences tab in the dramaTTS GUI and specify the file locations.
If you used the Festival version provided by the edGuideDog team the pre-compiled binaries are located in:
..Festival\src\main
After you specified a new tool location, you should save the preferences and restart dramaTTS to make the changes become effective.
Basic script format
dramaTTS’s script parser works with simple text files with minimum formatting.
General
Empty lines are ignored
in case of doubt a line will be assigned as narrative description
You can check how lines have been parsed, if you switch to the “Parsed lines”-mode text rendering mode in the “script” tab.
New scene
A new scene is indicated by a line starting with a number followed by a dot.
23. A new scene
The narrator will read the scene number and scene title.
Dialogue
A dialogue is indicated by a line giving only the speaker name in UPPER CASE letters - e.g.
BOB
Hi, I am Bob and this line is the my dialogue.
Bob's dialogue was quite short.
The next (non-empty) line after the dialogue-indicator (BOB) will be interpreted as the Bob’s dialogue text. A line break will end the Bob’s dialogue and the line following is interpreter as narrative description.
The narrator will say the speaker’s name and take over again after the speaker’s dialogue line is finished. You can easily check, who speaks the lines if you switch to the “Parsed lines”-mode text rendering mode in the “script” tab. The example above would be shown as:
Narrator: Bob
BOB: Hi, I am Bob and this line is the my dialogue.
Narrator: Bob's dialogue was quite short.
It is also possible to add narrative comments using parenthesis within a dialogue line:
BOB
I told you not to pull this lever! (shakes his head) Let's get the hell out of here!
In the example above, “shakes his head” will be spoken by the narrator -i.e. it would be rendered in “parsed lines”-mode as:
Narrator: Bob
BOB: I told you not to pull this lever!
Narrator: (shakes his head)
BOB: Let's get the hell out of here!
Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.