Skip to main content

CLI tool for RNN-T based text/audio forced alignment.

Project description

BFA Forced-Aligner (Text/Phoneme/Audio Alignment)

A CLI Python tool for text/audio alignment at word and phoneme level.
It supports both textual and phonetic input using either the IPA and Misaki phonesets.
The integrated G2P model supports both british and american English.
The final alignments are output in TextGrid format.

It's based on a RNN-T model (CNN/LSTM encoder + Transformer decoder) and was trained on 460 hours of audio from the LibriSpeech dataset.
The current architecture only supports audio clips up to about 17.5 seconds (see contributions).

No GPU is required to run this tool, but a CPU with lots of cores can help.

Installation

pip install BFA

Requires Python ≥ 3.12

Usage (CLI)

To align a corpus, two directories are expected:

  • One that contains all your audio files (.wav, .mp3, .flac and .pcm files only)
  • One that contains all your annotations (.txt and .lab files only)

You can find examples of such files in the example directory of this repository. A recursive search will be used so the only constraint is that both directories uses the same structure. If you use the same directory for both, then your .wav and .lab pairs should be in the same sub-directory.

bfa align \
  --audio-dir /path/to/audio_dir \
  --text-dir /path/to/text_dir \
  [--out-dir /path/to/out_dir] \
  [--dtype {words, phonemes}] \
  [--ptype {IPA, Misaki}] \
  [--language {EN-GB, EN-US}] \
  [--n-jobs N] \
  [--ignore-ram-usage] \
  [--config-path /path/to/config_file] \

Performances

Aligning the 460 hours of audio of the LibriSpeech dataset took 2h30 (realtime factor: x184) on a 8 cores / 16 threads CPU. Realtime factor on one core: x11.5.
1.5Go of RAM per thread are required (here, 24Go for 16 threads). By default, BFA will check your total RAM before starting jobs.
It successfully aligned more than 99% of the files.

To Do:

  • Test IPA ptype
  • Test Word dtype

Contributions

All contributions are welcomed but my main goal is the following:

Currently, the main limitation of this tool is it's context length (about 17.5 seconds) but RNN-T models can use a streaming implementation and this way handle files of arbitrary sizes. This would requires making the model causal (currently it isn't, in order to maximize accuracy) and write an inference function that can handle this method.

It would also be interesting to support .TextGrid files for annotations (input).

Licence

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bfa-0.1.0.tar.gz (10.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bfa-0.1.0-py3-none-any.whl (10.4 MB view details)

Uploaded Python 3

File details

Details for the file bfa-0.1.0.tar.gz.

File metadata

  • Download URL: bfa-0.1.0.tar.gz
  • Upload date:
  • Size: 10.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for bfa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d29a337d5f9855bc4079e9d5bee5e8d3ee55c0ab0f4b0d80a5910d30ebbf8092
MD5 54fd5eb9a5e6a3cf089d07360e163002
BLAKE2b-256 04ca14ae155e75c6888a882fb4db47404cb796700a7f87ffaaa7d3dd3d2ec8f3

See more details on using hashes here.

File details

Details for the file bfa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bfa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.1

File hashes

Hashes for bfa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5401475eef378addd0e4b1d60918ed1b5f427c076421b12d533152c2bc4dfdad
MD5 57f0d58503ec3d4fc35318fff2f763f8
BLAKE2b-256 6add1cc856c133f165ade159c9da31b54cd91a0bd97ac095eba0dcf182f3ad05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page