Skip to main content

Speech recognition with accurate word-level timestamps.

Project description

image

easytranscriber is an automatic speech recognition (ASR) library for transcription with precise word-level timestamps. While the transcription step itself is well-optimized in most ASR libraries, the surrounding components (data loading, emission extraction, forced alignment) often act as bottlenecks. easytranscriber optimizes these components and supports both ctranslate2 and Hugging Face transformers as inference backends. Notable features include:

  • GPU accelerated forced alignment, using Pytorch's forced alignment API. Forced alignment is based on a GPU implementation of the Viterbi algorithm (Pratap et al., 2024).
  • Parallel loading and pre-fetching of audio files for efficient data loading and batch processing.
  • Flexible text normalization for improved alignment quality. Users can supply custom regex-based text normalization functions to preprocess ASR outputs before alignment. A mapping from the original text to the normalized text is maintained internally. All of the applied normalizations and transformations are consequently non-destructive and reversible after alignment.
  • 35% to 102% faster inference compared to WhisperX. See the benchmarks for more details.
  • Batch inference support for wav2vec2 models (emission extraction).

Installation

With GPU support

pip install easytranscriber --extra-index-url https://download.pytorch.org/whl/cu128

[!TIP]
Remove --extra-index-url if you want a CPU-only installation.

Using uv

When installing with uv, it will select the appropriate PyTorch version automatically (CPU for macOS, CUDA for Linux/Windows/ARM):

uv pip install easytranscriber

Usage

Below, an example is provided of how transcribe an audio file with easytranscriber. We transcribe the first chapter of an audiobook recording of "A Tale of Two Cities". The recording is sourced from LibriVox.

from pathlib import Path

from easyaligner.text import load_tokenizer
from huggingface_hub import snapshot_download

from easytranscriber.pipelines import pipeline
from easytranscriber.text.normalization import text_normalizer

# Download Tale of Two Cities book 1 chapter 1 LibriVox audiobook recording for testing
snapshot_download(
    "Lauler/easytranscriber_tutorials",
    repo_type="dataset",
    local_dir="data/tutorials",
    allow_patterns="tale-of-two-cities_short-en/*",
    # max_workers=4,
)

tokenizer = load_tokenizer("english") # For sentence tokenization in forced alignment
audio_files = [file.name for file in Path("data/tutorials/tale-of-two-cities_short-en").glob("*")]
pipeline(
    vad_model="pyannote",
    emissions_model="facebook/wav2vec2-base-960h",
    transcription_model="distil-whisper/distil-large-v3.5",
    audio_paths=audio_files,
    audio_dir="data/tutorials/tale-of-two-cities_short-en",
    language="en",
    tokenizer=tokenizer,
    text_normalizer_fn=text_normalizer,
    cache_dir="models",
)

easysearch

easysearch is a built-in lightweight search interface for browsing and querying your transcription outputs. It indexes transcription chunks into a SQLite database with full-text search and serves a web UI with audio playback and synchronized transcript highlighting.

pip install easytranscriber[search]
easysearch --alignments-dir output/alignments --audio-dir data/audio

See the search documentation for details on search syntax, indexing, and configuration options.

Benchmarks

We present throughput comparisons between easytranscriber and WhisperX. See the benchmarks directory for code and details.

WhisperX relies on single-threaded data loading and CPU-based forced alignment, creating a bottleneck that is especially pronounced on hardware with slower single-core performance.

Benchmarks

All easytranscriber benchmarks were run using the ctranslate2 backend for transcription.

  • PyTorch version: 2.8.0
  • CUDA: 12.8
  • WhisperX version: 3.7.6
  • Model: KBLab/kb-whisper-large
  • Language: Swedish (sv)

Documentation

The documentation is available at kb-labb.github.io/easytranscriber/.

[!TIP] Check out the easyaligner library for a user friendly pipeline for forced alignment of text and audio.

Acknowledgements

easytranscriber draws heavy inspiration from WhisperX (Bain et al., 2023).

The forced alignment component of easytranscriber is based on Pytorch's forced alignment API, which implements a GPU-accelerated version of the Viterbi algorithm as described in Pratap et al., 2024.

LibriVox for public domain audiobooks used as tutorial examples.

Citation

@online{rekathati2026,
  author = {Rekathati, Faton},
  title = {Easytranscriber: {Speech} Recognition with Precise
    Timestamps},
  date = {2026-02-26},
  url = {https://kb-labb.github.io/posts/2026-02-26-easytranscriber/},
  langid = {en}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easytranscriber-0.2.2.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easytranscriber-0.2.2-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file easytranscriber-0.2.2.tar.gz.

File metadata

  • Download URL: easytranscriber-0.2.2.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.2.tar.gz
Algorithm Hash digest
SHA256 e3651eea9a03dd55aa82432690963ecadd0ce74b48d49d130cd0e39686890fae
MD5 21f435e21c1c18465453b02aa444447e
BLAKE2b-256 124b94429288986b17e3f545a1cb67a428e1f6b68112ad8eb70a2f8a8cf3be61

See more details on using hashes here.

File details

Details for the file easytranscriber-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: easytranscriber-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c56b7575cf1c4e28e742f467f2485ba6ce78fa6a3ba182008360595f251a69df
MD5 73793f8c88db081fd5ec8693640e16bc
BLAKE2b-256 e2fb37d60673442e47de4ae6707aa0361f39798b6597171cdd711a75a8953335

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page