Standalone forced alignment for Scottish Gaelic — no Kaldi/PyKaldi dependency

These details have not been verified by PyPI

Project links

Model

Project description

sk-align

Standalone forced alignment for Scottish Gaelic — no Kaldi or PyKaldi dependency.

sk-align reimplements Kaldi's nnet3 forced-alignment pipeline entirely in Python/NumPy/PyTorch, reading Kaldi model files directly. It produces word-level timestamps at parity with PyKaldi while being easier to install and deploy.

Features

Zero Kaldi dependency — pure Python reads Kaldi binary formats (final.mdl, tree, L.fst, etc.)
from_pretrained() — one-line model download from Hugging Face Hub
MFCC extraction — vectorised NumPy implementation matching Kaldi output
TDNN-F nnet3 inference — full PyTorch reimplementation of the forward pass
k2 Viterbi decoder — fast FSA-based decoding via intersect_dense + shortest_path
Word-level timestamps — [{"word": "hello", "start": 0.12, "end": 0.45}, ...]
Parity-tested — 55 tests verify numerical match against PyKaldi reference

Installation

pip install sk-align              # core (numpy + scipy + torch)
pip install sk-align[all]         # + huggingface_hub for from_pretrained()

k2 is required at runtime but must be installed separately because the PyPI k2 package pins old torch versions. Install from the k2-fsa project wheels:

# CPU-only
pip install k2 -f https://k2-fsa.github.io/k2/cpu.html

# CUDA (match your CUDA version)
pip install k2 -f https://k2-fsa.github.io/k2/cuda.html

See the k2 installation guide for details.

Or install from source:

git clone https://github.com/your-org/sk-align.git
cd sk-align/sk-align
pip install -e ".[all]"           # editable with all extras

Optional extras

Extra	Installs	Needed for
`hub`	`huggingface_hub>=0.20`	`Aligner.from_pretrained()`
`all`	`huggingface_hub`	Full end-to-end pipeline
`test`	`pytest` + `huggingface_hub`	Running the test suite
`dev`	`test` extras + `ruff`	Development

Quick start

from sk_align import Aligner

# Download model from Hugging Face and load (cached after first call)
aligner = Aligner.from_pretrained()

# audio: float32 numpy array, 16 kHz, mono
timestamps = aligner.align(audio, ["cumaidh", "sinn", "a'", "dol"])
# [{"word": "cumaidh", "start": 0.33, "end": 0.72},
#  {"word": "sinn",    "start": 0.72, "end": 0.99},
#  ...]

Loading a local model

from sk_align import Aligner
from sk_align.nnet3_torch import TorchNnetScorer

scorer = TorchNnetScorer.from_model_file("/path/to/model/final.mdl")
aligner = Aligner.from_model_dir("/path/to/model", nnet_scorer=scorer)

timestamps = aligner.align(audio, words)

Using pre-computed log-likelihoods

import numpy as np
from sk_align import Aligner

aligner = Aligner.from_model_dir("/path/to/model")  # no scorer needed
loglikes = np.load("loglikes.npy")  # (num_frames, num_pdfs)

timestamps = aligner.align_with_loglikes(loglikes, words)

Architecture

The alignment pipeline reimplements each stage of Kaldi's forced alignment in pure Python:

Audio (float32, 16 kHz)
  │
  ▼
┌─────────────────────┐
│  MFCC Extraction    │  sk_align.mfcc        (NumPy, batch-vectorised)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Nnet3 Forward Pass │  sk_align.nnet3_torch  (PyTorch TDNN-F)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Graph Compilation  │  sk_align.graph        (L ∘ G, context expansion)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Viterbi Decoding   │  sk_align.k2_decoder   (k2 FSA intersection)
└─────────┬───────────┘
          ▼
┌─────────────────────┐
│  Word Alignment     │  sk_align.word_align   (boundary extraction)
└─────────────────────┘
          │
          ▼
  [{"word": "...", "start": 0.12, "end": 0.45}, ...]

Modules

Module	Description
`sk_align.aligner`	High-level `Aligner` class — main entry point
`sk_align.mfcc`	MFCC feature extraction (batch NumPy, Kaldi-compatible)
`sk_align.nnet3_model`	Kaldi nnet3 binary parser
`sk_align.nnet3_torch`	PyTorch reimplementation of TDNN-F forward pass
`sk_align.fst`	OpenFst binary format reader + FST representation
`sk_align.graph`	Per-utterance decoding graph compiler (L ∘ G + context expansion)
`sk_align.tree`	Kaldi `ContextDependency` tree reader
`sk_align.transition_model`	Kaldi `TransitionModel` reader
`sk_align.k2_decoder`	k2-based Viterbi decoder
`sk_align.word_align`	Word boundary extraction + timestamp conversion
`sk_align.kaldi_io`	Low-level Kaldi binary I/O helpers

Model

The default model is hosted at eist-edinburgh/nnet3_alignment_model on Hugging Face Hub. It is a TDNN-F nnet3 alignment model (3456 PDFs) trained for Scottish Gaelic.

Expected model files:

final.mdl           TransitionModel + nnet3 weights
tree                ContextDependency tree
L.fst               Lexicon FST (OpenFst binary)
words.txt           Word symbol table
disambig.int        Disambiguation symbol IDs
word_boundary.int   Phone word-boundary types

Testing

The test suite verifies numerical parity with PyKaldi at every stage.

pip install -e ".[test]"
pytest                   # 49 tests — MFCC, I/O, graph, decoder, end-to-end parity

Tests include:

MFCC parity — feature output matches Kaldi within floating-point tolerance
I/O round-trip — all Kaldi binary readers produce correct data structures
Graph compilation — decoding graphs match expected state/arc counts
Decoder parity — k2 decoder alignment matches reference Viterbi output
End-to-end parity — word timestamps match PyKaldi within 30ms

Performance

Benchmark on a 5-second Scottish Gaelic utterance (25 words), CPU:

Stage	Time	% of total
MFCC	25 ms	4%
Nnet3 forward	434 ms	75%
Graph compile	46 ms	8%
k2 decode	72 ms	13%
Word align	<1 ms	<1%
Total	578 ms	—

End-to-end throughput is at parity with PyKaldi (~560 ms per utterance).

License

MIT

Project details

These details have not been verified by PyPI

Project links

Model

Release history Release notifications | RSS feed

0.3.1

Apr 14, 2026

0.3.0

Apr 14, 2026

0.2.3

Apr 14, 2026

0.2.2

Apr 14, 2026

This version

0.2.1

Apr 13, 2026

0.2.0

Apr 13, 2026

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sk_align-0.2.1.tar.gz (53.8 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sk_align-0.2.1-py3-none-any.whl (43.6 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file sk_align-0.2.1.tar.gz.

File metadata

Download URL: sk_align-0.2.1.tar.gz
Upload date: Apr 13, 2026
Size: 53.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sk_align-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`190e6df50a47a286c36d24f5e2d8bda71e68a04bb4b36e756604d73717dc6c89`
MD5	`39d01b604b1d6eda701f5c1c45b4212d`
BLAKE2b-256	`e8f06517fe729c35dd29c8ebe8c9738410c34f60868f5c395fe96d8a3859c015`

See more details on using hashes here.

File details

Details for the file sk_align-0.2.1-py3-none-any.whl.

File metadata

Download URL: sk_align-0.2.1-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 43.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for sk_align-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1496cedede643e8fcd1d14a3f25dd1c75e483c0ceb63c6493b4d4850780fa3eb`
MD5	`247a5806883c10e925c975f7cdb48828`
BLAKE2b-256	`936609c3d489efea236e8d18f305b8e6358e8a25b5812295ed89d8a8fa9740bc`

See more details on using hashes here.

sk-align 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sk-align

Features

Installation

Optional extras

Quick start

Loading a local model

Using pre-computed log-likelihoods

Architecture

Modules

Model

Testing

Performance

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes