This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment).

Goal

aeneas automatically generates a synchronization map between a list of text fragments and an audio file containing the narration of the text. In computer science this task is known as (automatically computing a) forced alignment.

For example, given this text file and this audio file, aeneas determines, for each fragment, the corresponding time interval in the audio file:

1                                                     => [00:00:00.000, 00:00:02.640]
From fairest creatures we desire increase,            => [00:00:02.640, 00:00:05.880]
That thereby beauty's rose might never die,           => [00:00:05.880, 00:00:09.240]
But as the riper should by time decease,              => [00:00:09.240, 00:00:11.920]
His tender heir might bear his memory:                => [00:00:11.920, 00:00:15.280]
But thou contracted to thine own bright eyes,         => [00:00:15.280, 00:00:18.800]
Feed'st thy light's flame with self-substantial fuel, => [00:00:18.800, 00:00:22.760]
Making a famine where abundance lies,                 => [00:00:22.760, 00:00:25.680]
Thy self thy foe, to thy sweet self too cruel:        => [00:00:25.680, 00:00:31.240]
Thou that art now the world's fresh ornament,         => [00:00:31.240, 00:00:34.400]
And only herald to the gaudy spring,                  => [00:00:34.400, 00:00:36.920]
Within thine own bud buriest thy content,             => [00:00:36.920, 00:00:40.640]
And tender churl mak'st waste in niggarding:          => [00:00:40.640, 00:00:43.640]
Pity the world, or else this glutton be,              => [00:00:43.640, 00:00:48.080]
To eat the world's due, by the grave and thee.        => [00:00:48.080, 00:00:53.240]

Waveform with aligned labels, detail

This synchronization map can be output to file in several formats: EAF for research purposes, SMIL for EPUB 3, SBV/SRT/SUB/TTML/VTT for closed captioning, JSON for Web usage, or raw AUD/CSV/SSV/TSV/TXT/XML for further processing.

System Requirements, Supported Platforms and Installation

System Requirements

  1. a reasonably recent machine (recommended 4 GB RAM, 2 GHz 64bit CPU)
  2. Python 2.7 (Linux, OS X, Windows) or 3.4 or later (Linux, OS X)
  3. FFmpeg
  4. eSpeak
  5. Python packages BeautifulSoup4, lxml, and numpy
  6. Python headers to compile the Python C/C++ extensions (optional but strongly recommended)
  7. A shell supporting UTF-8 (optional but strongly recommended)

Supported Platforms

aeneas has been developed and tested on Debian 64bit, which is the only supported OS at the moment. Nevertheless, aeneas has been confirmed to work on other Linux distributions, OS X, and Windows. See the PLATFORMS file for the details.

If installing aeneas natively on your OS proves difficult, you are strongly encouraged to use aeneas-vagrant, which provides aeneas inside a virtualized Debian image running under VirtualBox and Vagrant, which can be installed on any modern OS (Linux, Mac OS X, Windows).

Installation

All-in-one installers are available for Mac OS X and Windows, and a Bash script for deb-based Linux distributions (Debian, Ubuntu) is provided in this repository. It is also possible to download a VirtualBox+Vagrant virtual machine. Please see the INSTALL file for detailed, step-by-step installation procedures for different operating systems.

The generic OS-independent procedure is simple:

  1. Install Python (2.7.x preferred), FFmpeg, and eSpeak

  2. Make sure the following executables can be called from your shell: espeak, ffmpeg, ffprobe, pip, and python

  3. First install numpy with pip and then aeneas:

    pip install numpy
    pip install aeneas
    
  4. To check whether you installed aeneas correctly, run:

bash     python -m aeneas.diagnostics

Usage

  1. Run without arguments to get the usage message:

    python -m aeneas.tools.execute_task
    python -m aeneas.tools.execute_job
    

    You can also get a list of live examples that you can immediately run on your machine thanks to the included files:

    python -m aeneas.tools.execute_task --examples
    python -m aeneas.tools.execute_task --examples-all
    
  2. To compute a synchronization map map.json for a pair (audio.mp3, text.txt in plain text format), you can run:

    python -m aeneas.tools.execute_task \
        audio.mp3 \
        text.txt \
        "task_language=eng|os_task_file_format=json|is_text_type=plain" \
        map.json
    

(The command has been split into lines with \ for visual clarity; in production you can have the entire command on a single line and/or you can use shell variables.)

To compute a synchronization map map.smil for a pair (audio.mp3, page.xhtml containing fragments marked by id attributes like f001), you can run:

```bash
python -m aeneas.tools.execute_task \
    audio.mp3 \
    page.xhtml \
    "task_language=eng|os_task_file_format=smil|os_task_file_smil_audio_ref=audio.mp3|os_task_file_smil_page_ref=page.xhtml|is_text_type=unparsed|is_text_unparsed_id_regex=f[0-9]+|is_text_unparsed_id_sort=numeric" \
    map.smil
```

As you can see, the third argument (the configuration string) specifies the parameters controlling the I/O formats and the processing options for the task. Consult the documentation for details.

  1. If you have several tasks to process, you can create a job container to batch process them:

    python -m aeneas.tools.execute_job job.zip output_directory
    

File job.zip should contain a config.txt or config.xml configuration file, providing aeneas with all the information needed to parse the input assets and format the output sync map files. Consult the documentation for details.

The documentation contains a highly suggested tutorial which explains how to use the built-in command line tools.

Supported Features

  • Input text files in parsed, plain, subtitles, or unparsed (XML) format
  • Multilevel input text files in mplain and munparsed (XML) format
  • Text extraction from XML (e.g., XHTML) files using id and class attributes
  • Arbitrary text fragment granularity (single word, subphrase, phrase, paragraph, etc.)
  • Input audio file formats: all those readable by ffmpeg
  • Output sync map formats: AUD, CSV, EAF, JSON, SMIL, SRT, SSV, SUB, TSV, TTML, TXT, VTT, XML
  • Confirmed working on 37 languages: ARA, BUL, CAT, CYM, CES, DAN, DEU, ELL, ENG, EPO, EST, FAS, FIN, FRA, GLE, GRC, HRV, HUN, ISL, ITA, JPN, LAT, LAV, LIT, NLD, NOR, RON, RUS, POL, POR, SLK, SPA, SRP, SWA, SWE, TUR, UKR
  • MFCC and DTW computed via Python C extensions to reduce the processing time
  • Several built-in TTS engine wrappers: eSpeak (default), eSpeak-ng, Festival, Nuance TTS API
  • Default TTS (eSpeak) called via a Python C extension for fast audio synthesis
  • Possibility of running a custom, user-provided TTS engine Python wrapper (e.g., included example for speect)
  • Batch processing of multiple audio/text pairs
  • Download audio from a YouTube video
  • In multilevel mode, recursive alignment from paragraph to sentence to word level
  • In multilevel mode, time resolution and/or TTS engine can be specified for each level independently
  • Robust against misspelled/mispronounced words, local rearrangements of words, background noise/sporadic spikes
  • Adjustable splitting times, including a max character/second constraint for CC applications
  • Automated detection of audio head/tail
  • Output an HTML file for fine tuning the sync map manually (finetuneas project)
  • Execution parameters tunable at runtime
  • Code suitable for Web app deployment (e.g., on-demand cloud computing)
  • Extensive test suite including 800+ unit/integration/performance tests, that run and must pass before each release

Limitations and Missing Features

  • Audio should match the text: large portions of spurious text or audio might produce a wrong sync map
  • Audio is assumed to be spoken: not suitable for song captioning, YMMV for CC applications
  • No protection against memory trashing if you feed extremely long audio files (>1.5h per single audio file)
  • Open issues

License

aeneas is released under the terms of the GNU Affero General Public License Version 3. See the LICENSE file for details.

Licenses for third party code and files included in aeneas can be found in the licenses directory.

No copy rights were harmed in the making of this project.

Supporting and Contributing

Sponsors

  • July 2015: Michele Gianella generously supported the development of the boundary adjustment code (v1.0.4)
  • August 2015: Michele Gianella partially sponsored the port of the MFCC/DTW code to C (v1.1.0)
  • September 2015: friends in West Africa partially sponsored the development of the head/tail detection code (v1.2.0)
  • October 2015: an anonymous donation sponsored the development of the “YouTube downloader” option (v1.3.0)
  • April 2016: the Fruch Foundation kindly sponsored the development and documentation of v1.5.0

Supporting

Would you like supporting the development of aeneas?

I accept sponsorships to

  • fix bugs,
  • add new features,
  • improve the quality and the performance of the code,
  • port the code to other languages/platforms,
  • support of third party installations, and
  • improve the documentation.

Feel free to get in touch.

Contributing

If you think you found a bug or you have a feature request, please use the GitHub issue tracker to submit it.

If you want to ask a question about using aeneas, your best option consists in sending an email to the mailing list.

Finally, code contributions are welcome! Please refer to the Code Contribution Guide for details about the branch policies and the code style to follow.

Acknowledgments

Many thanks to Nicola Montecchio, who suggested using MFCCs and DTW, and co-developed the first experimental code for aligning audio and text.

Paolo Bertasi, who developed the APIs and Web application for ReadBeyond Sync, helped shaping the structure of this package for its asynchronous usage.

Chris Hubbard prepared the files for packaging aeneas as a Debian/Ubuntu .deb.

Daniel Bair prepared the brew formula for installing aeneas and its dependencies on Mac OS X.

Daniel Bair, Chris Hubbard, and Richard Margetts packaged the installers for Mac OS X and Windows.

Firat Ozdemir contributed the finetuneas HTML/JS code for fine tuning sync maps in the browser.

All the mighty GitHub contributors, and the members of the Google Group.

Release History

Release History

1.6.0.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.6.0.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.5.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.5.0.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.5.0.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.5.0.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.4.1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.4.0.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.3.3.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.3.2.8

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.3.2.6

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
aeneas-1.6.0.1.tar.gz (5.4 MB) Copy SHA256 Checksum SHA256 Source Sep 30, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting