aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).
aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.
1 => [00:00:00.000, 00:00:02.640] From fairest creatures we desire increase, => [00:00:02.640, 00:00:05.880] That thereby beauty's rose might never die, => [00:00:05.880, 00:00:09.240] But as the riper should by time decease, => [00:00:09.240, 00:00:11.920] His tender heir might bear his memory: => [00:00:11.920, 00:00:15.280] But thou contracted to thine own bright eyes, => [00:00:15.280, 00:00:18.800] Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760] Making a famine where abundance lies, => [00:00:22.760, 00:00:25.680] Thy self thy foe, to thy sweet self too cruel: => [00:00:25.680, 00:00:31.240] Thou that art now the world's fresh ornament, => [00:00:31.240, 00:00:34.400] And only herald to the gaudy spring, => [00:00:34.400, 00:00:36.920] Within thine own bud buriest thy content, => [00:00:36.920, 00:00:40.640] And tender churl mak'st waste in niggarding: => [00:00:40.640, 00:00:43.640] Pity the world, or else this glutton be, => [00:00:43.640, 00:00:48.080] To eat the world's due, by the grave and thee. => [00:00:48.080, 00:00:53.240]
Waveform with aligned labels, detail
This synchronization map can be output to file in several formats, depending on its application:
aeneas has been developed and tested on Debian 64bit, with Python 2.7 and Python 3.5, which are the only supported platforms at the moment. Nevertheless, aeneas has been confirmed to work on other Linux distributions, Mac OS X, and Windows. See the PLATFORMS file for details.
If installing aeneas natively on your OS proves difficult, you are strongly encouraged to use aeneas-vagrant, which provides aeneas inside a virtualized Debian image running under VirtualBox and Vagrant, which can be installed on any modern OS (Linux, Mac OS X, Windows).
All-in-one installers are available for Mac OS X and Windows, and a Bash script for deb-based Linux distributions (Debian, Ubuntu) is provided in this repository. It is also possible to download a VirtualBox+Vagrant virtual machine. Please see the INSTALL file for detailed, step-by-step installation procedures for different operating systems.
The generic OS-independent procedure is simple:
Make sure the following executables can be called from your shell: espeak, ffmpeg, ffprobe, pip, and python
First install numpy with pip and then aeneas (this order is important):
pip install numpy pip install aeneas
To check whether you installed aeneas correctly, run:
bash python -m aeneas.diagnostics
Run without arguments to get the usage message:
python -m aeneas.tools.execute_task python -m aeneas.tools.execute_job
You can also get a list of live examples that you can immediately run on your machine thanks to the included files:
python -m aeneas.tools.execute_task --examples python -m aeneas.tools.execute_task --examples-all
To compute a synchronization map map.json for a pair (audio.mp3, text.txt in plain text format), you can run:
python -m aeneas.tools.execute_task \ audio.mp3 \ text.txt \ "task_language=eng|os_task_file_format=json|is_text_type=plain" \ map.json
(The command has been split into lines with \ for visual clarity; in production you can have the entire command on a single line and/or you can use shell variables.)
To compute a synchronization map map.smil for a pair (audio.mp3, page.xhtml containing fragments marked by id attributes like f001), you can run:
```bash python -m aeneas.tools.execute_task \ audio.mp3 \ page.xhtml \ "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \ map.smil ```
As you can see, the third argument (the configuration string) specifies the parameters controlling the I/O formats and the processing options for the task. Consult the documentation for details.
If you have several tasks to process, you can create a job container to batch process them:
python -m aeneas.tools.execute_job job.zip output_directory
File job.zip should contain a config.txt or config.xml configuration file, providing aeneas with all the information needed to parse the input assets and format the output sync map files. Consult the documentation for details.
A significant number of users runs aeneas to align audio and text at word-level (i.e., each fragment is a word). Although aeneas was not designed with word-level alignment in mind and the results might be inferior to ASR-based forced aligners for languages with good ASR models, aeneas offers some options to improve the quality of the alignment at word-level:
If you use the aeneas.tools.execute_task command line tool, you can add --presets-word switch to enable MFCC nonspeech masking, for example:
$ python -m aeneas.tools.execute_task --example-words --presets-word $ python -m aeneas.tools.execute_task --example-words-multilevel --presets-word
If you use aeneas as a library, just set the appropriate RuntimeConfiguration parameters. Please see the command line tutorial for details.
aeneas is released under the terms of the GNU Affero General Public License Version 3. See the LICENSE file for details.
Licenses for third party code and files included in aeneas can be found in the licenses directory.
No copy rights were harmed in the making of this project.
Would you like supporting the development of aeneas?
I accept sponsorships to
Feel free to get in touch.
If you think you found a bug or you have a feature request, please use the GitHub issue tracker to submit it.
If you want to ask a question about using aeneas, your best option consists in sending an email to the mailing list.
Finally, code contributions are welcome! Please refer to the Code Contribution Guide for details about the branch policies and the code style to follow.
Many thanks to Nicola Montecchio, who suggested using MFCCs and DTW, and co-developed the first experimental code for aligning audio and text.
Paolo Bertasi, who developed the APIs and Web application for ReadBeyond Sync, helped shaping the structure of this package for its asynchronous usage.
Chris Hubbard prepared the files for packaging aeneas as a Debian/Ubuntu .deb.
Daniel Bair prepared the brew formula for installing aeneas and its dependencies on Mac OS X.
Daniel Bair, Chris Hubbard, and Richard Margetts packaged the installers for Mac OS X and Windows.
Firat Ozdemir contributed the finetuneas HTML/JS code for fine tuning sync maps in the browser.
Willem van der Walt contributed the code snippet to output a sync map in TextGrid format.
Chris Vaughn contributed the MacOS TTS wrapper.