Skip to main content

a simple neural forced aligner for phoneme to audio alignment

Project description

snfa

snfa (Simple Neural Forced Aligner) is a phoneme-to-audio forced aligner built for embedded usage in python programs, with its only inference dependency being numpy and python 3.7 or later.

  • Tiny model size (2 MB)
  • Numpy as the only dependency
  • MFA comparable alignment quality

Note: You still need PyTorch and some other libs if you want to do training.

Inference

pip install snfa

A pre-trained model weight jp.npz is included.

jp.npz is a weight file trained on Japanese Common Voice Corpus 14.0, 6/28/2023. The model weight is released into Public Domain.

import snfa
import librosa # or soundfile, torchaudio, scipy, etc.


aligner = snfa.Aligner() # use custom model by passing its path to this function
# NOTE: the default model is uncased, it doesn't make difference between `U` and `u`
transcript = "k o N n i ch i w a".lower().split(" ") # so remember lower it here

# you can also use `scipy` or `wavfile` as long as you normalized it to [-1,1]
# and sample rate matches model's
# `numpy` can't handle the wav part, sorry kid
x, sr = librosa.load("sample.wav", sr=aligner.sr)

segments = aligner(x, transcript)

print(segment)
# (phoneme label, start mili-sec, end mili-sec, score)
# [('pau', 0, 908, 0.9583546351318474),
#  ('k', 908, 928, 0.006900709283433312),
#  ('o', 928, 1088, 0.795996002234283),
# ...]

# NOTE: The timestamps are in mili-sec, you can convert them to the indices on wavform by
wav_index = int(timestamp * aligner.sr / 1000)

Development

We use uv to manage dependencies.

The following command will install them.

uv sync

Training

Download Common Voice Dataset and extract it somewhere.

uv run -c config.yaml -d /path/to/common/voice/

Checkpoints will be saved to logs/lightning_logs/

Be noted that the -d should point to where the *.tsvs are. In Japanese CV dataset, it's sub directory ja.

Bundle

When bundling app with pyinstaller, add

from PyInstaller.utils.hooks import collect_data_files

data = collect_data_files('snfa')

# consume the data in Analyzer

To bundle the model weights properly. I'd appreciate it if you offer a better way.

Todos

  • Rust crate
  • multi-language

Licence

snfa is released under ISC Licence, as shown here.

The file snfa/stft.py contains code adapted from librosa which obeys ISC Licence with different copyright claim. A copy of librosa's licence can be found in librosa's repo.

The file snfa/viterbi.py contains code adapted from torchaudio which obeys BSD 2-Clause "Simplified" License. A copy of torchaudio's licence can be found in torchaudio's repo.

Credit

The neural network used in snfa is basically a PyTorch implementation of CTC* structure described in Evaluating Speech—Phoneme Alignment and Its Impact on Neural Text-To-Speech Synthesis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snfa-0.1.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snfa-0.1.0-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file snfa-0.1.0.tar.gz.

File metadata

  • Download URL: snfa-0.1.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for snfa-0.1.0.tar.gz
Algorithm Hash digest
SHA256 93ec0664111b402c0445fad8f4b13311bc2f2ec7561b3250f992a28630b1e5bc
MD5 ea3fa12169747cc60357338ce7b779f9
BLAKE2b-256 a7be98ac15a5a35dc31f1f36cf34396b622af387321c450536a62b89f1c930b4

See more details on using hashes here.

File details

Details for the file snfa-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: snfa-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.19

File hashes

Hashes for snfa-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d045d86d0acb89a86bef5c966e295b3026a422cca55c1cabb929fb3110e8f347
MD5 5f5b0df5c680efc4da98b106c1cc1c3c
BLAKE2b-256 66ecc0c909c4a84f187633977d3583b48d1fd00021ea5697b04ca75cbaf94ee3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page