easytranscriber

Speech recognition with accurate word-level timestamps.

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

Lauler

Project description

easytranscriber is an automatic speech recognition (ASR) library for transcription with precise word-level timestamps. While the transcription step itself is well-optimized in most ASR libraries, the surrounding components (data loading, emission extraction, forced alignment) often act as bottlenecks. easytranscriber optimizes these components and supports both ctranslate2 and Hugging Face transformers as inference backends. Notable features include:

GPU accelerated forced alignment, using Pytorch's forced alignment API. Forced alignment is based on a GPU implementation of the Viterbi algorithm (Pratap et al., 2024).
Parallel loading and pre-fetching of audio files for efficient data loading and batch processing.
Flexible text normalization for improved alignment quality. Users can supply custom regex-based text normalization functions to preprocess ASR outputs before alignment. A mapping from the original text to the normalized text is maintained internally. All of the applied normalizations and transformations are consequently non-destructive and reversible after alignment.
35% to 102% faster inference compared to WhisperX. See the benchmarks for more details.
Batch inference support for wav2vec2 models (emission extraction).

Installation

With GPU support

pip install easytranscriber --extra-index-url https://download.pytorch.org/whl/cu128

[!TIP]
Remove --extra-index-url if you want a CPU-only installation.

Using uv

When installing with uv, it will select the appropriate PyTorch version automatically (CPU for macOS, CUDA for Linux/Windows/ARM):

uv pip install easytranscriber

Usage

Below, an example is provided of how transcribe an audio file with easytranscriber. We transcribe the first chapter of an audiobook recording of "A Tale of Two Cities". The recording is sourced from LibriVox.

from pathlib import Path

from easyaligner.text import load_tokenizer
from huggingface_hub import snapshot_download

from easytranscriber.pipelines import pipeline
from easytranscriber.text.normalization import text_normalizer

# Download Tale of Two Cities book 1 chapter 1 LibriVox audiobook recording for testing
snapshot_download(
    "Lauler/easytranscriber_tutorials",
    repo_type="dataset",
    local_dir="data/tutorials",
    allow_patterns="tale-of-two-cities_short-en/*",
    # max_workers=4,
)

tokenizer = load_tokenizer("english") # For sentence tokenization in forced alignment
audio_files = [file.name for file in Path("data/tutorials/tale-of-two-cities_short-en").glob("*")]
pipeline(
    vad_model="pyannote",
    emissions_model="facebook/wav2vec2-base-960h",
    transcription_model="distil-whisper/distil-large-v3.5",
    audio_paths=audio_files,
    audio_dir="data/tutorials/tale-of-two-cities_short-en",
    language="en",
    tokenizer=tokenizer,
    text_normalizer_fn=text_normalizer,
    cache_dir="models",
)

easysearch

easysearch is a built-in lightweight search interface for browsing and querying your transcription outputs. It indexes transcription chunks into a SQLite database with full-text search and serves a web UI with audio playback and synchronized transcript highlighting.

pip install easytranscriber[search]
easysearch --alignments-dir output/alignments --audio-dir data/audio

See the search documentation for details on search syntax, indexing, and configuration options.

Benchmarks

We present throughput comparisons between easytranscriber and WhisperX. See the benchmarks directory for code and details.

WhisperX relies on single-threaded data loading and CPU-based forced alignment, creating a bottleneck that is especially pronounced on hardware with slower single-core performance.

Benchmarks

All easytranscriber benchmarks were run using the ctranslate2 backend for transcription.

PyTorch version: 2.8.0
CUDA: 12.8
WhisperX version: 3.7.6
Model: KBLab/kb-whisper-large
Language: Swedish (sv)

Documentation

The documentation is available at kb-labb.github.io/easytranscriber/.

[!TIP] Check out the easyaligner library for a user friendly pipeline for forced alignment of text and audio.

Acknowledgements

easytranscriber draws heavy inspiration from WhisperX (Bain et al., 2023).

The forced alignment component of easytranscriber is based on Pytorch's forced alignment API, which implements a GPU-accelerated version of the Viterbi algorithm as described in Pratap et al., 2024.

LibriVox for public domain audiobooks used as tutorial examples.

Citation

@online{rekathati2026,
  author = {Rekathati, Faton},
  title = {Easytranscriber: {Speech} Recognition with Precise
    Timestamps},
  date = {2026-02-26},
  url = {https://kb-labb.github.io/posts/2026-02-26-easytranscriber/},
  langid = {en}
}

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

Lauler

Release history Release notifications | RSS feed

This version

0.2.2

Apr 19, 2026

0.2.1

Apr 14, 2026

0.2.0

Mar 2, 2026

0.1.0

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easytranscriber-0.2.2.tar.gz (27.1 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

easytranscriber-0.2.2-py3-none-any.whl (31.0 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file easytranscriber-0.2.2.tar.gz.

File metadata

Download URL: easytranscriber-0.2.2.tar.gz
Upload date: Apr 19, 2026
Size: 27.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`e3651eea9a03dd55aa82432690963ecadd0ce74b48d49d130cd0e39686890fae`
MD5	`21f435e21c1c18465453b02aa444447e`
BLAKE2b-256	`124b94429288986b17e3f545a1cb67a428e1f6b68112ad8eb70a2f8a8cf3be61`

See more details on using hashes here.

File details

Details for the file easytranscriber-0.2.2-py3-none-any.whl.

File metadata

Download URL: easytranscriber-0.2.2-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 31.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.11.7 {"installer":{"name":"uv","version":"0.11.7","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for easytranscriber-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c56b7575cf1c4e28e742f467f2485ba6ce78fa6a3ba182008360595f251a69df`
MD5	`73793f8c88db081fd5ec8693640e16bc`
BLAKE2b-256	`e2fb37d60673442e47de4ae6707aa0361f39798b6597171cdd711a75a8953335`

See more details on using hashes here.

easytranscriber 0.2.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

Installation

With GPU support

Using uv

Usage

easysearch

Benchmarks

Documentation

Acknowledgements

Citation

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes