Skip to main content

Standalone forced alignment for Scottish Gaelic — no Kaldi/PyKaldi dependency

Project description

sk-align

Python 3.10+ PyPI License: MIT Hugging Face Model Tests

Standalone forced alignment for Scottish Gaelic — no Kaldi or PyKaldi dependency.

sk-align reimplements Kaldi's nnet3 forced-alignment pipeline entirely in Python/NumPy/PyTorch, reading Kaldi model files directly. It produces word-level timestamps at parity with PyKaldi while being easier to install and deploy.


Features

  • Zero Kaldi dependency — pure Python reads Kaldi binary formats (final.mdl, tree, L.fst, etc.)
  • from_pretrained() — one-line model download from Hugging Face Hub
  • MFCC extraction — vectorised NumPy implementation matching Kaldi output
  • TDNN-F nnet3 inference — full PyTorch reimplementation of the forward pass
  • k2 Viterbi decoder — fast FSA-based decoding via intersect_dense + shortest_path
  • Word-level timestamps[{"word": "hello", "start": 0.12, "end": 0.45}, ...]
  • Parity-tested — 55 tests verify numerical match against PyKaldi reference

Installation

pip install sk-align              # core (numpy + scipy + torch)
pip install sk-align[all]         # + huggingface_hub for from_pretrained()

k2 is required at runtime but must be installed separately because the PyPI k2 package pins old torch versions. Install from the k2-fsa project wheels:

# CPU-only
pip install k2 -f https://k2-fsa.github.io/k2/cpu.html

# CUDA (match your CUDA version)
pip install k2 -f https://k2-fsa.github.io/k2/cuda.html

See the k2 installation guide for details.

Or install from source:

git clone https://github.com/your-org/sk-align.git
cd sk-align/sk-align
pip install -e ".[all]"           # editable with all extras

Optional extras

Extra Installs Needed for
hub huggingface_hub>=0.20 Aligner.from_pretrained()
all huggingface_hub Full end-to-end pipeline
test pytest + huggingface_hub Running the test suite
dev test extras + ruff Development

Quick start

from sk_align import Aligner

# Download model from Hugging Face and load (cached after first call)
aligner = Aligner.from_pretrained()

# audio: float32 numpy array, 16 kHz, mono
timestamps = aligner.align(audio, ["cumaidh", "sinn", "a'", "dol"])
# [{"word": "cumaidh", "start": 0.33, "end": 0.72},
#  {"word": "sinn",    "start": 0.72, "end": 0.99},
#  ...]

Loading a local model

from sk_align import Aligner
from sk_align.nnet3_torch import TorchNnetScorer

scorer = TorchNnetScorer.from_model_file("/path/to/model/final.mdl")
aligner = Aligner.from_model_dir("/path/to/model", nnet_scorer=scorer)

timestamps = aligner.align(audio, words)

Using pre-computed log-likelihoods

import numpy as np
from sk_align import Aligner

aligner = Aligner.from_model_dir("/path/to/model")  # no scorer needed
loglikes = np.load("loglikes.npy")  # (num_frames, num_pdfs)

timestamps = aligner.align_with_loglikes(loglikes, words)

Architecture

The alignment pipeline reimplements each stage of Kaldi's forced alignment in pure Python:

Audio (float32, 16 kHz)
  │
  ▼
┌─────────────────────┐
│  MFCC Extraction    │  sk_align.mfcc        (NumPy, batch-vectorised)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Nnet3 Forward Pass │  sk_align.nnet3_torch  (PyTorch TDNN-F)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Graph Compilation  │  sk_align.graph        (L ∘ G, context expansion)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Viterbi Decoding   │  sk_align.k2_decoder   (k2 FSA intersection)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Word Alignment     │  sk_align.word_align   (boundary extraction)
└─────────────────────┘
          │
          ▼
  [{"word": "...", "start": 0.12, "end": 0.45}, ...]

Modules

Module Description
sk_align.aligner High-level Aligner class — main entry point
sk_align.mfcc MFCC feature extraction (batch NumPy, Kaldi-compatible)
sk_align.nnet3_model Kaldi nnet3 binary parser
sk_align.nnet3_torch PyTorch reimplementation of TDNN-F forward pass
sk_align.fst OpenFst binary format reader + FST representation
sk_align.graph Per-utterance decoding graph compiler (L ∘ G + context expansion)
sk_align.tree Kaldi ContextDependency tree reader
sk_align.transition_model Kaldi TransitionModel reader
sk_align.k2_decoder k2-based Viterbi decoder
sk_align.word_align Word boundary extraction + timestamp conversion
sk_align.kaldi_io Low-level Kaldi binary I/O helpers

Model

The default model is hosted at eist-edinburgh/nnet3_alignment_model on Hugging Face Hub. It is a TDNN-F nnet3 alignment model (3456 PDFs) trained for Scottish Gaelic.

Expected model files:

final.mdl           TransitionModel + nnet3 weights
tree                ContextDependency tree
L.fst               Lexicon FST (OpenFst binary)
words.txt           Word symbol table
disambig.int        Disambiguation symbol IDs
word_boundary.int   Phone word-boundary types

Testing

The test suite verifies numerical parity with PyKaldi at every stage.

pip install -e ".[test]"
pytest                   # 49 tests — MFCC, I/O, graph, decoder, end-to-end parity

Tests include:

  • MFCC parity — feature output matches Kaldi within floating-point tolerance
  • I/O round-trip — all Kaldi binary readers produce correct data structures
  • Graph compilation — decoding graphs match expected state/arc counts
  • Decoder parity — k2 decoder alignment matches reference Viterbi output
  • End-to-end parity — word timestamps match PyKaldi within 30ms

Performance

Benchmark on a 5-second Scottish Gaelic utterance (25 words), CPU:

Stage Time % of total
MFCC 25 ms 4%
Nnet3 forward 434 ms 75%
Graph compile 46 ms 8%
k2 decode 72 ms 13%
Word align <1 ms <1%
Total 578 ms

End-to-end throughput is at parity with PyKaldi (~560 ms per utterance).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sk_align-0.2.1.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sk_align-0.2.1-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file sk_align-0.2.1.tar.gz.

File metadata

  • Download URL: sk_align-0.2.1.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sk_align-0.2.1.tar.gz
Algorithm Hash digest
SHA256 190e6df50a47a286c36d24f5e2d8bda71e68a04bb4b36e756604d73717dc6c89
MD5 39d01b604b1d6eda701f5c1c45b4212d
BLAKE2b-256 e8f06517fe729c35dd29c8ebe8c9738410c34f60868f5c395fe96d8a3859c015

See more details on using hashes here.

File details

Details for the file sk_align-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sk_align-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sk_align-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1496cedede643e8fcd1d14a3f25dd1c75e483c0ceb63c6493b4d4850780fa3eb
MD5 247a5806883c10e925c975f7cdb48828
BLAKE2b-256 936609c3d489efea236e8d18f305b8e6358e8a25b5812295ed89d8a8fa9740bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page