Skip to main content

Add language model support to HF Transformers' Whisper models

Project description

Whisper-LM-Transformers

KenLM and Large language model integration with Whisper ASR models implemented in Hugging Face library.

Installation

Install the package from PyPI:

pip install whisper-lm-transformers

Or clone and install locally:

git clone https://github.com/hitz-zentroa/whisper-lm-transformers.git
cd whisper-lm-transformers
pip install .

Besides, a recent version of [KenLM](pip install https://github.com/kpu/kenlm/archive/master.zip) is required to use n-gram language models:

pip install https://github.com/kpu/kenlm/archive/master.zip

Usage Examples

1) Using Hugging Face Pipeline

There is a new pipeline task called "whisper-with-lm". Once imported, you can do:

>>> from transformers import pipeline
>>> from huggingface_hub import hf_hub_download
>>> import whisper_lm_transformers  # Required to register the new pipeline

>>> # Download the n-gram model
>>> lm_model = hf_hub_download(repo_id="HiTZ/whisper-lm-ngrams", filename="5gram-eu.bin")

>>> # Example: KenLM-based decoding
>>> pipe = pipeline(
...     "whisper-with-lm",
...     model="zuazo/whisper-tiny-eu",
...     lm_model=lm_model, # Provide a kenlm model path
...     lm_alpha=0.33582369,
...     lm_beta=0.68825565,
...     language="eu",
... )

>>> # Transcribe an audio file or array
>>> pipe("tests/data/audio.wav")["text"]
'Talka diskoetxearekin grabatzen ditut beti abestien maketak.'

Note: In the example above, we use our Basque KenLM model. Optimize the lm_alpha, lm_beta, etc., for best results with your own models.

Integrating a Large Language Model

If you prefer to use a Large LM:

>>> # Load the pipeline
>>> pipe = pipeline(
...     "whisper-with-lm",
...     model="zuazo/whisper-tiny-eu",
...     llm_model="HiTZ/latxa-7b-v1.2", # Hugging Face LLM name or path
...     lm_alpha=2.73329396,
...     lm_beta=0.00178595,
...     language="eu",
... )

>>> # Transcribe an audio file or array
>>> pipe("tests/data/audio.wav")["text"]
'Talka diskoetxearekin grabatzen ditut beti abestien maketak.'

Caution: Running large LMs side-by-side with Whisper requires sufficient GPU memory.

2) Using the WhisperWithLM Class Directly

If you prefer manual control, you can use the WhisperWithLM class:

>>> from datasets import Audio, load_dataset
>>> from transformers import WhisperProcessor
>>> from whisper.audio import load_audio

>>> from whisper_lm_transformers import WhisperWithLM

>>> # Load the model
>>> model_name = "zuazo/whisper-tiny-eu"
>>> processor = WhisperProcessor.from_pretrained(model_name)
>>> model = WhisperWithLM.from_pretrained(model_name)

>>> # Load an audio example
>>> ds = load_dataset("openslr", "SLR76", split="train", trust_remote_code=True)
>>> audio = load_audio(ds[28]["audio"]["path"])

>>> # Process the audio and generate the output
>>> inputs = processor(audio=audio, sampling_rate=16000, return_tensors="pt")
>>> generated = model.generate(
...     input_features=inputs["input_features"],
...     tokenizer=processor.tokenizer,
...     lm_model="tests/5gram-eu.bin", # Provide a kenlm model path
...     lm_alpha=0.33582369,
...     lm_beta=0.68825565,
...     num_beams=5,
...     language="eu",
... )
>>> processor.decode(generated[0], skip_special_tokens=True)
'Talka diskoetxearekin grabatzen ditut beti abestien maketak.'

Audio Processing Note

In the last example, we used OpenAI’s load_audio() function for reproduction. You can also use standard HF audio processing methods , e.g. ds.cast_column("audio", Audio(sampling_rate=16000)). However, keep consistent sample rates and methods, as different audio preprocessing can yield different internal logits, thus altering the final LM integration results. For example, if you have optimized the language model using our whisper-lm repository based on OpenAI's Whisper implementation, we recommend re-running the optimization with the scripts provided here for the best results.

Included Scripts

The package includes the following scripts:

  • whisper_evaluate_with_hf: Evaluates a Whisper model in a dataset.
  • whisper_lm_optimizer_with_hf: Optimize the n-gram or large language model.

Run them with --help to see how to use them.

Contributing

Contributions, bug reports, and feature requests are welcome! Please check out CONTRIBUTING.md for details on how to set up your environment and run tests before submitting changes.

Citation

If you find this helpful in your research, please cite:

@misc{dezuazo2025whisperlmimprovingasrmodels,
      title={Whisper-LM: Improving ASR Models with Language Models for Low-Resource Languages},
      author={Xabier de Zuazo and Eva Navas and Ibon Saratxaga and Inma Hernáez Rioja},
      year={2025},
      eprint={2503.23542},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2503.23542},
}

Please, check the related paper preprint in arXiv:2503.23542 for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_lm_transformers-0.2.0.tar.gz (32.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_lm_transformers-0.2.0-py3-none-any.whl (31.5 kB view details)

Uploaded Python 3

File details

Details for the file whisper_lm_transformers-0.2.0.tar.gz.

File metadata

  • Download URL: whisper_lm_transformers-0.2.0.tar.gz
  • Upload date:
  • Size: 32.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for whisper_lm_transformers-0.2.0.tar.gz
Algorithm Hash digest
SHA256 4997837f01f26ec07849874cb2d77f918827cfd88096eaac6c70bb9aeb4a812f
MD5 720e28b3ce627d2f007d9bde4651814d
BLAKE2b-256 20b55a2eb82bd449acf7b1af756c2ad59b0546dfb9bc4c28e13fd1ca2b98bb1a

See more details on using hashes here.

File details

Details for the file whisper_lm_transformers-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for whisper_lm_transformers-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03950c80ca2ba822dee2a807f005e22f0f3e03e25e992e59f1379615c146c534
MD5 cabe1f886ba49e1c4a6f54e2611cd0f0
BLAKE2b-256 601e8b97dc2cf4e2e50c293098db65224f5d2c2cdf931c09e24c12c73de5056b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page