Skip to main content

Speech recognition with accurate word-level timestamps.

Project description

image

easytranscriber is an automatic speech recognition (ASR) library for transcription with precise word-level timestamps. While the transcription step itself is well-optimized in most ASR libraries, the surrounding components (data loading, emission extraction, forced alignment) often act as bottlenecks. easytranscriber optimizes these components and supports both ctranslate2 and Hugging Face transformers as inference backends. Notable features include:

  • GPU accelerated forced alignment, using Pytorch's forced alignment API. Forced alignment is based on a GPU implementation of the Viterbi algorithm (Pratap et al., 2024).
  • Parallel loading and pre-fetching of audio files for efficient data loading and batch processing.
  • Flexible text normalization for improved alignment quality. Users can supply custom regex-based text normalization functions to preprocess ASR outputs before alignment. A mapping from the original text to the normalized text is maintained internally. All of the applied normalizations and transformations are consequently non-destructive and reversible after alignment.
  • 35% to 102% faster inference compared to WhisperX. See the benchmarks for more details.
  • Batch inference support for wav2vec2 models (emission extraction).

Installation

With GPU support

pip install easytranscriber --extra-index-url https://download.pytorch.org/whl/cu128

[!TIP]
Remove --extra-index-url if you want a CPU-only installation.

Using uv

When installing with uv, it will select the appropriate PyTorch version automatically (CPU for macOS, CUDA for Linux/Windows/ARM):

uv pip install easytranscriber

Usage

Below, an example is provided of how transcribe an audio file with easytranscriber. We transcribe the first chapter of an audiobook recording of "A Tale of Two Cities". The recording is sourced from LibriVox.

from pathlib import Path

from easyaligner.text import load_tokenizer
from huggingface_hub import snapshot_download

from easytranscriber.pipelines import pipeline
from easytranscriber.text.normalization import text_normalizer

# Download Tale of Two Cities book 1 chapter 1 LibriVox audiobook recording for testing
snapshot_download(
    "Lauler/easytranscriber_tutorials",
    repo_type="dataset",
    local_dir="data/tutorials",
    allow_patterns="tale-of-two-cities_short-en/*",
    # max_workers=4,
)

tokenizer = load_tokenizer("english") # For sentence tokenization in forced alignment
audio_files = [file.name for file in Path("data/tutorials/tale-of-two-cities_short-en").glob("*")]
pipeline(
    vad_model="pyannote",
    emissions_model="facebook/wav2vec2-base-960h",
    transcription_model="distil-whisper/distil-large-v3.5",
    audio_paths=audio_files,
    audio_dir="data/tutorials/tale-of-two-cities_short-en",
    language="en",
    tokenizer=tokenizer,
    text_normalizer_fn=text_normalizer,
    cache_dir="models",
)

easysearch

easysearch is a built-in lightweight search interface for browsing and querying your transcription outputs. It indexes transcription chunks into a SQLite database with full-text search and serves a web UI with audio playback and synchronized transcript highlighting.

pip install easytranscriber[search]
easysearch --alignments-dir output/alignments --audio-dir data/audio

See the search documentation for details on search syntax, indexing, and configuration options.

Benchmarks

We present throughput comparisons between easytranscriber and WhisperX. See the benchmarks directory for code and details.

WhisperX relies on single-threaded data loading and CPU-based forced alignment, creating a bottleneck that is especially pronounced on hardware with slower single-core performance.

Benchmarks

All easytranscriber benchmarks were run using the ctranslate2 backend for transcription.

  • PyTorch version: 2.8.0
  • CUDA: 12.8
  • WhisperX version: 3.7.6
  • Model: KBLab/kb-whisper-large
  • Language: Swedish (sv)

Documentation

The documentation is available at kb-labb.github.io/easytranscriber/.

[!TIP] Check out the easyaligner library for a user friendly pipeline for forced alignment of text and audio.

Acknowledgements

easytranscriber draws heavy inspiration from WhisperX (Bain et al., 2023).

The forced alignment component of easytranscriber is based on Pytorch's forced alignment API, which implements a GPU-accelerated version of the Viterbi algorithm as described in Pratap et al., 2024.

LibriVox for public domain audiobooks used as tutorial examples.

Citation

@online{rekathati2026,
  author = {Rekathati, Faton},
  title = {Easytranscriber: {Speech} Recognition with Precise
    Timestamps},
  date = {2026-02-26},
  url = {https://kb-labb.github.io/posts/2026-02-26-easytranscriber/},
  langid = {en}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easytranscriber-0.2.1.tar.gz (25.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easytranscriber-0.2.1-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file easytranscriber-0.2.1.tar.gz.

File metadata

  • Download URL: easytranscriber-0.2.1.tar.gz
  • Upload date:
  • Size: 25.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.1.tar.gz
Algorithm Hash digest
SHA256 3a8f5ed802506e6869626734b7fc4a08f9be59bcde5907133f55b23b554feffe
MD5 0e80f15c2d6141f33aa5c253ab8c1d10
BLAKE2b-256 838b5b2cdfb9116d17271bc3801fa9446739b19b56720cd90257b06cbfa9577f

See more details on using hashes here.

File details

Details for the file easytranscriber-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: easytranscriber-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 28.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c83d84f00548daebbc566c5372e009e66f4a521b386c60ef5247fed55223a399
MD5 954f790052e087e910b69cf0b637784c
BLAKE2b-256 5834682675e1ae5863a2c1ff0977e483910f515e6e5f2e975cc1bbc4d4535ac4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page