Add language model support to HF Transformers' Whisper models

These details have not been verified by PyPI

Project links

Project description

Whisper-LM-Transformers

KenLM and Large language model integration with Whisper ASR models implemented in Hugging Face library.

Installation

Install the package from PyPI:

pip install whisper-lm-transformers

Or clone and install locally:

git clone https://github.com/hitz-zentroa/whisper-lm-transformers.git
cd whisper-lm-transformers
pip install .

Besides, a recent version of [KenLM](pip install https://github.com/kpu/kenlm/archive/master.zip) is required to use n-gram language models:

pip install https://github.com/kpu/kenlm/archive/master.zip

Usage Examples

1) Using Hugging Face Pipeline

There is a new pipeline task called "whisper-with-lm". Once imported, you can do:

>>> from transformers import pipeline
>>> from huggingface_hub import hf_hub_download
>>> import whisper_lm_transformers  # Required to register the new pipeline

>>> # Download the n-gram model
>>> lm_model = hf_hub_download(repo_id="HiTZ/whisper-lm-ngrams", filename="5gram-eu.bin")

>>> # Example: KenLM-based decoding
>>> pipe = pipeline(
...     "whisper-with-lm",
...     model="zuazo/whisper-tiny-eu",
...     lm_model=lm_model, # Provide a kenlm model path
...     lm_alpha=0.33582369,
...     lm_beta=0.68825565,
...     language="eu",
... )

>>> # Transcribe an audio file or array
>>> pipe("tests/data/audio.wav")["text"]
'Talka diskoetxearekin grabatzen ditut beti abestien maketak.'

Note: In the example above, we use our Basque KenLM model. Optimize the lm_alpha, lm_beta, etc., for best results with your own models.

Integrating a Large Language Model

If you prefer to use a Large LM:

>>> # Load the pipeline
>>> pipe = pipeline(
...     "whisper-with-lm",
...     model="zuazo/whisper-tiny-eu",
...     llm_model="HiTZ/latxa-7b-v1.2", # Hugging Face LLM name or path
...     lm_alpha=2.73329396,
...     lm_beta=0.00178595,
...     language="eu",
... )

>>> # Transcribe an audio file or array
>>> pipe("tests/data/audio.wav")["text"]
'Talka diskoetxearekin grabatzen ditut beti abestien maketak.'

Caution: Running large LMs side-by-side with Whisper requires sufficient GPU memory.

2) Using the `WhisperWithLM` Class Directly

If you prefer manual control, you can use the WhisperWithLM class:

>>> from datasets import Audio, load_dataset
>>> from transformers import WhisperProcessor
>>> from whisper.audio import load_audio

>>> from whisper_lm_transformers import WhisperWithLM

>>> # Load the model
>>> model_name = "zuazo/whisper-tiny-eu"
>>> processor = WhisperProcessor.from_pretrained(model_name)
>>> model = WhisperWithLM.from_pretrained(model_name)

>>> # Load an audio example
>>> ds = load_dataset("openslr", "SLR76", split="train", trust_remote_code=True)
>>> audio = load_audio(ds[28]["audio"]["path"])

>>> # Process the audio and generate the output
>>> inputs = processor(audio=audio, sampling_rate=16000, return_tensors="pt")
>>> generated = model.generate(
...     input_features=inputs["input_features"],
...     tokenizer=processor.tokenizer,
...     lm_model="tests/5gram-eu.bin", # Provide a kenlm model path
...     lm_alpha=0.33582369,
...     lm_beta=0.68825565,
...     num_beams=5,
...     language="eu",
... )
>>> processor.decode(generated[0], skip_special_tokens=True)
'Talka diskoetxearekin grabatzen ditut beti abestien maketak.'

Audio Processing Note

In the last example, we used OpenAI’s load_audio() function for reproduction. You can also use standard HF audio processing methods , e.g. ds.cast_column("audio", Audio(sampling_rate=16000)). However, keep consistent sample rates and methods, as different audio preprocessing can yield different internal logits, thus altering the final LM integration results. For example, if you have optimized the language model using our whisper-lm repository based on OpenAI's Whisper implementation, we recommend re-running the optimization with the scripts provided here for the best results.

Included Scripts

The package includes the following scripts:

whisper_evaluate_with_hf: Evaluates a Whisper model in a dataset.
whisper_lm_optimizer_with_hf: Optimize the n-gram or large language model.

Run them with --help to see how to use them.

Contributing

Contributions, bug reports, and feature requests are welcome! Please check out CONTRIBUTING.md for details on how to set up your environment and run tests before submitting changes.

Citation

If you find this helpful in your research, please cite:

@misc{dezuazo2025whisperlmimprovingasrmodels,
      title={Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages},
      author={Xabier de Zuazo and Eva Navas and Ibon Saratxaga and Inma Hernáez Rioja},
      year={2025},
      eprint={2503.23542},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.23542},
}

Please, check the related paper preprint in arXiv:2503.23542 for more details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 8, 2025

0.1.0

Mar 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_lm_transformers-0.2.0.tar.gz (32.5 kB view details)

Uploaded Apr 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_lm_transformers-0.2.0-py3-none-any.whl (31.5 kB view details)

Uploaded Apr 8, 2025 Python 3

File details

Details for the file whisper_lm_transformers-0.2.0.tar.gz.

File metadata

Download URL: whisper_lm_transformers-0.2.0.tar.gz
Upload date: Apr 8, 2025
Size: 32.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for whisper_lm_transformers-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`4997837f01f26ec07849874cb2d77f918827cfd88096eaac6c70bb9aeb4a812f`
MD5	`720e28b3ce627d2f007d9bde4651814d`
BLAKE2b-256	`20b55a2eb82bd449acf7b1af756c2ad59b0546dfb9bc4c28e13fd1ca2b98bb1a`

See more details on using hashes here.

File details

Details for the file whisper_lm_transformers-0.2.0-py3-none-any.whl.

File metadata

Download URL: whisper_lm_transformers-0.2.0-py3-none-any.whl
Upload date: Apr 8, 2025
Size: 31.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for whisper_lm_transformers-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03950c80ca2ba822dee2a807f005e22f0f3e03e25e992e59f1379615c146c534`
MD5	`cabe1f886ba49e1c4a6f54e2611cd0f0`
BLAKE2b-256	`601e8b97dc2cf4e2e50c293098db65224f5d2c2cdf931c09e24c12c73de5056b`

See more details on using hashes here.

whisper-lm-transformers 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Whisper-LM-Transformers

Installation

Usage Examples

1) Using Hugging Face Pipeline

Integrating a Large Language Model

2) Using the `WhisperWithLM` Class Directly

Audio Processing Note

Included Scripts

Contributing

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

whisper-lm-transformers 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Whisper-LM-Transformers

Installation

Usage Examples

1) Using Hugging Face Pipeline

Integrating a Large Language Model

2) Using the WhisperWithLM Class Directly

Audio Processing Note

Included Scripts

Contributing

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

2) Using the `WhisperWithLM` Class Directly