Skip to main content

Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus

Project description

KFA

[Google Colab]

A fast Khmer Forced Aligner powered by Wav2Vec2CTC and Phonetisaurus.

  • Built-in Speech Enhancement
  • Word-level Alignment
pip install kfa

CLI

[!Note] audio.wav Input audio sample rate should be in 16kHz. Use ffmpeg or any other tools to resample the audio before processing.

ffmpeg -i audio_orig.wav -ac 1 -ar 16000 audio.wav

kfa -a audio.wav -t text.txt -o alignments.jsonl

# Output as Whisper style JSON format
kfa -a audio.wav -t text.txt --format whisper -o alignments.json

Python

from kfa import align, create_session
import librosa

with open("test.txt") as infile:
    text = infile.read()

y, sr = librosa.load("text.wav", sr=16000, mono=True)
session = create_session()

for alignment in align(y, sr, text, session=session):
  print(alignment)

References

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kfa-0.2.0.tar.gz (10.8 MB view hashes)

Uploaded Source

Built Distribution

kfa-0.2.0-py3-none-any.whl (10.9 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page