Skip to main content

A Python library for forced alignment of English text to English audio.

Project description

ForceAlign

ForceAlign is a Python library for forced alignment of English text to English Audio. You can use this library to get word or phoneme-level text alignments to English audio. In short, forced alignment is the process of identifying the specific time a word (or words) was spoken within an audio recording. ForceAlign supports forced alignment for .mp3 and .wav audio file formats.

For phoneme level text alignments, ForceAlign currently only supports the ARPABET phonetic transcription encoding.

ForceAlign uses Pytorch's WAV2VEC2 pretrained model for acoustic feature extraction and can be ran on both CPU and CUDA GPU devices.

Features

  • Fast and accurate word and phoneme level forced alignment of text to audio.
  • Is optimized for both CPU and GPU.
  • OS independent! Use ForceAlign on Mac, Windows, and Linux.

Installation and Dependencies

  1. Pip Install ForceAlign
    • pip3 install forcealign
  2. Install ffmpeg
    • Mac: brew install ffmpeg
    • Linux: sudo apt install ffmpeg
    • Windows: Install from ffmpeg.org

Usage Examples

To use ForceAlign, instantiate a ForceAlign object instance with your specified audio file and corresponding text transcript.

Example 1: Getting Word-Level Text Alignments

from forcealign import ForceAlign

# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Runs prediction and returns alignment results
words = align.inference()

# Show predicted word-level alignments
for word in words:
	print(word.word) # The word spoken in audio at associated time
	print(word.time_start) # Time (seconds) the word starts in speech.mp3
	print(word.time_end) # Time (seconds) the word ends in speech.mp3w

Example 2: Getting Phoneme-Level Text Alignments

from forcealign import ForceAlign

# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Runs prediction and returns alignment results
words = align.inference() 

# Accessing predicted phenome-level alignments
for word in words:
	print(word.word)
	for phoneme in word.phonemes:
		print(phoneme.phoneme) # ARPABET phonome spoken in audio at associated time
		print(phoneme.time_start) # Time (seconds) the phoneme starts in speech.mp3
		print(phoneme.time_end) # Time (seconds) the phoneme ends in speech.mp3

Example 3: Reviewing Word Level-Alignments

You can use the review_alignment() method to check the quality of your alignment in real-time. The review_alignment() method will play the audio file and print the individual words at their predicted times. This is useful for heuristically checking the accuracy of the word-level alignment predictions.

from forcealign import ForceAlign

# Provide path to audio_file and corresponding transcript
align = ForceAlign(audio_file='./speech.mp3', transcript=transcript)

# Runs prediction and returns alignment results
words = align.inference() 

# Plays audio and prints each word in real-time at predicted alignment time.
align.review_alignment()

Use Cases

Forced alignment can be useful for generating subtitles for video, and for generating automated lip-syncing of animated characters with phoneme-level forced alignments.

FAQ

1. Does ForceAlign have speech-to-text capabilities? No. This is a feature that I plan on adding soon when I have time.

2. Can ForceAlign be used with both CPU and GPU? Yes. Running with CPU is surprisingly fast, and it will be even faster with GPU.

Acknowledgements

This project is heavily based upon a demo from Pytorch by Moto Hira: FORCED ALIGNMENT WITH WAV2VEC2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

forcealign-1.1.7.tar.gz (6.3 kB view details)

Uploaded Source

Built Distribution

forcealign-1.1.7-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file forcealign-1.1.7.tar.gz.

File metadata

  • Download URL: forcealign-1.1.7.tar.gz
  • Upload date:
  • Size: 6.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for forcealign-1.1.7.tar.gz
Algorithm Hash digest
SHA256 331fb562045dd0b074562863c36c03a65704ebdfa7a82286cf8b426c25c44a49
MD5 7c4200f08e5594467ba2841485f102c9
BLAKE2b-256 d7a21b144a4958c0050868d7bed0e208159ff5be00487ef598b4a88992e79871

See more details on using hashes here.

File details

Details for the file forcealign-1.1.7-py3-none-any.whl.

File metadata

  • Download URL: forcealign-1.1.7-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for forcealign-1.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 599d6c7484bb9ad6395e6c258b43605adbd9a82c5bde112ff8a62ea69daf2c38
MD5 5de0d85815dd7cdab46db4aa0a18c23e
BLAKE2b-256 4c6363e55204ac1c88fbb3dfd5c3c2edef3d853f78eed84ea3b3d5042b4c667f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page