Python forced aligner

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Project description

Python forced alignment

This is a modified implementation of the Penn Phonetic Forced Aligner (P2FA) [1]. Relative to the original implementation, this repo provides the following.

Support for Python 3
Support for performing forced alignment both in Python and on the command-line
Fewer alignment failures due to, e.g., out-of-vocabulary (OOV) words or punctuation
Direct integration with pypar, a feature-rich phoneme alignment representation.
Multiprocessing for quickly aligning speech datasets
Clean, documented code

Installation

Hidden Markov Model Toolkit (HTK)

pyfoal depends on HTK and has been tested on Mac OS and Linux using HTK version 3.4.0. There are known issues in using version 3.4.1 on Linux. HTK is released under a license that prohibits redistribution, so you must install HTK yourself and verify that the commands HCopy and HVite are available as system-wide binaries. After downloading HTK, I use the following for installation on Linux.

sudo apt-get install -y gcc-multilib libx11-dev
./configure --disable-hslab
make all
sudo make install

For more help with HTK installation, see notes by Jaekoo Kang and Steve Rubin.

Python dependencies

Clone this repo and run pip install -e pyfoal/.

Usage

Force-align text and audio

alignment = pyfoal.align(text, audio, sample_rate)

text is a string containing the speech transcript. audio is a 1D numpy array containing the speech audio.

Force-align from files

# Return the resulting alignment
alignment = pyfoal.from_file(text_file, audio_file)

# Save alignment to json
pyfoal.from_file_to_file(text_file, audio_file, output_file)

If you need to align many files, use from_files_to_files, which accepts lists of files and uses multiprocessing.

Command-line interface

usage: python -m pyfoal
    [-h]
    --text TEXT [TEXT ...]
    --audio AUDIO [AUDIO ...]
    --output OUTPUT [OUTPUT ...]

optional arguments:
  -h, --help            show this help message and exit
  --text TEXT [TEXT ...]
                        The speech transcript files
  --audio AUDIO [AUDIO ...]
                        The speech audio files
  --output OUTPUT [OUTPUT ...]
                        The json files to save the alignments

Tests

Tests can be run as follows.

pip install pytest
pytest

References

[1] J. Yuan and M. Liberman, “Speaker identification on the scotus corpus,” Journal of the Acoustical Society of America, vol. 123, p. 3878, 2008.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

Release history Release notifications | RSS feed

1.0.1

Apr 12, 2024

1.0.0

Sep 14, 2023

0.0.4

Dec 6, 2021

0.0.3

Apr 3, 2021

0.0.2

Feb 23, 2021

This version

0.0.1

Feb 23, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyfoal-0.0.1.tar.gz (2.8 MB view hashes)

Uploaded Feb 23, 2021 Source

Built Distribution

pyfoal-0.0.1-py3-none-any.whl (2.9 MB view hashes)

Uploaded Feb 23, 2021 Python 3

Hashes for pyfoal-0.0.1.tar.gz

Hashes for pyfoal-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`685e06fa897c50e89e682346798b0ee16742bc5b5a55752dfcba70c335e7b0fa`
MD5	`b3c8b3f8f80eed2169727ef8ec55888d`
BLAKE2b-256	`06ceabfe2eb9c4e304452fc1583173c35bff38dabd6c0057a3b0a73c468592a0`

Hashes for pyfoal-0.0.1-py3-none-any.whl

Hashes for pyfoal-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9a7afb24582c0790de52b0e7392632fa2bb3355c8800829a1da564e0d5051e59`
MD5	`e7e875057e17f89f63eba9d5ba81791e`
BLAKE2b-256	`a7cf3adc9961dba238910dac77bab5827dd9f5c47ff2d4891300b5b09884f8fb`