Skip to main content

Speech recognition with accurate word-level timestamps.

Project description

image

easytranscriber is an automatic speech recognition (ASR) library for transcription with precise word-level timestamps. While the transcription step itself is well-optimized in most ASR libraries, the surrounding components (data loading, emission extraction, forced alignment) often act as bottlenecks. easytranscriber optimizes these components and supports both ctranslate2 and Hugging Face transformers as inference backends. Notable features include:

  • GPU accelerated forced alignment, using Pytorch's forced alignment API. Forced alignment is based on a GPU implementation of the Viterbi algorithm (Pratap et al., 2024).
  • Parallel loading and pre-fetching of audio files for efficient data loading and batch processing.
  • Flexible text normalization for improved alignment quality. Users can supply custom regex-based text normalization functions to preprocess ASR outputs before alignment. A mapping from the original text to the normalized text is maintained internally. All of the applied normalizations and transformations are consequently non-destructive and reversible after alignment.
  • 35% to 102% faster inference compared to WhisperX. See the benchmarks for more details.
  • Batch inference support for wav2vec2 models (emission extraction).

Installation

With GPU support

pip install easytranscriber --extra-index-url https://download.pytorch.org/whl/cu128

[!TIP]
Remove --extra-index-url if you want a CPU-only installation.

Using uv

When installing with uv, it will select the appropriate PyTorch version automatically (CPU for macOS, CUDA for Linux/Windows/ARM):

uv pip install easytranscriber

Usage

Below, an example is provided of how transcribe an audio file with easytranscriber. We transcribe the first chapter of an audiobook recording of "A Tale of Two Cities". The recording is sourced from LibriVox.

from pathlib import Path

from easyaligner.text import load_tokenizer
from huggingface_hub import snapshot_download

from easytranscriber.pipelines import pipeline
from easytranscriber.text.normalization import text_normalizer

# Download Tale of Two Cities book 1 chapter 1 LibriVox audiobook recording for testing
snapshot_download(
    "Lauler/easytranscriber_tutorials",
    repo_type="dataset",
    local_dir="data/tutorials",
    allow_patterns="tale-of-two-cities_short-en/*",
    # max_workers=4,
)

tokenizer = load_tokenizer("english") # For sentence tokenization in forced alignment
audio_files = [file.name for file in Path("data/tutorials/tale-of-two-cities_short-en").glob("*")]
pipeline(
    vad_model="pyannote",
    emissions_model="facebook/wav2vec2-base-960h",
    transcription_model="distil-whisper/distil-large-v3.5",
    audio_paths=audio_files,
    audio_dir="data/tutorials/tale-of-two-cities_short-en",
    language="en",
    tokenizer=tokenizer,
    text_normalizer_fn=text_normalizer,
    cache_dir="models",
)

easysearch

easysearch is a built-in lightweight search interface for browsing and querying your transcription outputs. It indexes transcription chunks into a SQLite database with full-text search and serves a web UI with audio playback and synchronized transcript highlighting.

pip install easytranscriber[search]
easysearch --alignments-dir output/alignments --audio-dir data/audio

See the search documentation for details on search syntax, indexing, and configuration options.

Benchmarks

We present throughput comparisons between easytranscriber and WhisperX. See the benchmarks directory for code and details.

WhisperX relies on single-threaded data loading and CPU-based forced alignment, creating a bottleneck that is especially pronounced on hardware with slower single-core performance.

Benchmarks

All easytranscriber benchmarks were run using the ctranslate2 backend for transcription.

  • PyTorch version: 2.8.0
  • CUDA: 12.8
  • WhisperX version: 3.7.6
  • Model: KBLab/kb-whisper-large
  • Language: Swedish (sv)

Documentation

The documentation is available at kb-labb.github.io/easytranscriber/.

[!TIP] Check out the easyaligner library for a user friendly pipeline for forced alignment of text and audio.

Acknowledgements

easytranscriber draws heavy inspiration from WhisperX (Bain et al., 2023).

The forced alignment component of easytranscriber is based on Pytorch's forced alignment API, which implements a GPU-accelerated version of the Viterbi algorithm as described in Pratap et al., 2024.

LibriVox for public domain audiobooks used as tutorial examples.

Citation

@online{rekathati2026,
  author = {Rekathati, Faton},
  title = {Easytranscriber: {Speech} Recognition with Precise
    Timestamps},
  date = {2026-02-26},
  url = {https://kb-labb.github.io/posts/2026-02-26-easytranscriber/},
  langid = {en}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easytranscriber-0.2.3.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easytranscriber-0.2.3-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file easytranscriber-0.2.3.tar.gz.

File metadata

  • Download URL: easytranscriber-0.2.3.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.3.tar.gz
Algorithm Hash digest
SHA256 10d3403607e01c08ccc66e6744b5dda236dc69a00b963f5bc5c04ecf100ede44
MD5 d1e7bcc25177859cc4963477c6bbc219
BLAKE2b-256 3e510b41d5312e48874065beefa9c24a78e2981313671e07da691cd4537b82a6

See more details on using hashes here.

File details

Details for the file easytranscriber-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: easytranscriber-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.8 {"installer":{"name":"uv","version":"0.11.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 2efd506c80f5ceb333fdf879cfcd228172afdd48bfe18374e88c876da5b5bc6d
MD5 03b4e19b4f4326eb0b59f1be073dd20a
BLAKE2b-256 9e86cbd30898ae6a03a07d63d39d78dd4cdc7b2c5d3a504db1a6d9c0545c0288

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page