Skip to main content

a simple neural forced aligner for phoneme to audio alignment

Reason this release was yanked:

Not better than 0.3.0 yet slower

Project description

snfa

snfa (Simple Neural Forced Aligner) is a phoneme-to-audio forced aligner built for embedded usage in python programs, with its only inference dependency being numpy and python 3.7 or later.

  • Tiny model size (~1 MB)
  • Numpy as the only dependency
  • MFA comparable alignment quality

Note: You still need PyTorch and some other libs if you want to do training.

Inference

pip install snfa

A pre-trained model weight jp.npz is included.

jp.npz is a weight file trained on Japanese Common Voice Corpus 20.0, 06/12/2024. The model weight is released into Public Domain.

import snfa
from snfa import Segment
import librosa # or soundfile, torchaudio, scipy, etc.


aligner = snfa.Aligner() # use custom model by passing its path to this function
# NOTE: the default model is uncased, it doesn't make difference between `U` and `u`
transcript = "k o N n i ch i w a".lower().split(" ") # remember to lower it here

# you can also use `scipy` or `wavfile` as long as it's 
# 1. mono channel numpy array with shape (T,), dtype=np.float32
# 2. normalized to [-1,1]
# 3. sample rate matches model's `sr`
x, sr = librosa.load("sample.wav", sr=aligner.sr)
# trim the audio, this may improve alignment quality
x, _ = librosa.effects.trim(x, top_db=20)
# we also provide a utility function to trim
# it's basically ripped off from librosa so you don't have to install it
x, _ = snfa.trim_audio(x, top_db=20)

segments: list[Segment] = aligner(x, transcript)

print(segments)
# (phoneme label, start mili-sec, end mili-sec)
# [('pau', 0, 900),
#  ('k', 900, 920),
#  ('o', 920, 1080),
# ...]

# NOTE: The timestamps are in mili-sec, you can convert them to the indices on wavform by
wav_index = int(timestamp * aligner.sr / 1000)

Development

We use uv to manage dependencies.

The following command will install them.

uv sync

Training

Download Common Voice Dataset and extract it somewhere.

We use the split from whole validated.tsv, while filtered out the dev and test split.

Filter the dataset:

uv run filter_dataset.py -d /path/to/common/voice/

Start training:

uv run -c config.yaml -d /path/to/common/voice/

Checkpoints and tensorboard logs will be saved to logs/lightning_logs/

Be noted that parameter -d should point to where the *.tsvs are. In Japanese CV dataset, it's sub directory ja.

Exporting

To use the model in numpy, export the checkpoint with

uv run export.py -c config.yaml --ckpt /path/to/checkpoint -o output.npz

Publishing

Usually I am responsible for publishing the package to PyPI, this section serves as a reminder for myself.

  1. copy the exported jp.npz to src/snfa/models/
  2. uv build
  3. uv publish

Bundle

When bundling app with pyinstaller, add

from PyInstaller.utils.hooks import collect_data_files

data = collect_data_files('snfa')

# consume `data` in Analyzer

To bundle the model weights properly. I'd appreciate it if you offer a better way.

Todos

  • Rust crate
  • multi-language

Licence

snfa is released under ISC Licence, as shown here.

The file snfa/stft.py and snfa/util.py contains code adapted from librosa which obeys ISC Licence with different copyright claim. A copy of librosa's licence can be found in librosa's repo.

The file snfa/viterbi.py contains code adapted from torchaudio which obeys BSD 2-Clause "Simplified" License. A copy of torchaudio's licence can be found in torchaudio's repo.

The testing audio file is ripped from Japanese Common Voice Corpus 14.0, 6/28/2023, Public Domain.

Credit

The neural network used in snfa is basically a PyTorch implementation of CTC* structure described in Evaluating Speech—Phoneme Alignment and Its Impact on Neural Text-To-Speech Synthesis.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snfa-0.3.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snfa-0.3.1-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file snfa-0.3.1.tar.gz.

File metadata

  • Download URL: snfa-0.3.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for snfa-0.3.1.tar.gz
Algorithm Hash digest
SHA256 64fca0069d0d7786ef057e976f6235dc08f9f48a18454eec619363e496b3de0d
MD5 72db0bfb96ca28db89a54ec517c348d3
BLAKE2b-256 698752fd0346a24cce712c9646e788979323e68e81db885371140836e89c0268

See more details on using hashes here.

File details

Details for the file snfa-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: snfa-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for snfa-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1d2d56428fe10844d5b830f22db48e9d58e72060a5f954c9da03892087aba3f1
MD5 9c93ce73c322c06bf65ccad0bf0b28dc
BLAKE2b-256 152fd4f6c97c714cce319c1d9dff8a6b29db682eed601376cd055b73e65c7ec2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page