Skip to main content

Omnilingual ASR Modeling Library

Project description

Header image with a collage of on-the-ground photos from the transcription gathering efforts in Pakistan and Liberia.

Photographs captured during corpus creation efforts in Pakistan and Liberia.

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR is an open-source speech recognition system supporting over 1,600 languages — including hundreds never previously covered by any ASR technology. Designed for broad accessibility, it enables new languages to be added with just a few paired examples without requiring specialized expertise or large datasets. By combining scalable zero-shot learning with a flexible model family, Omnilingual ASR aims to make speech technology more inclusive and adaptable for communities and researchers worldwide.

Performance results table

Our 7B-LLM-ASR system achieves state-of-the-art performance across 1,600+ languages, with character error rates (CER) below 10 for 78% of those languages.

December 2025 Update

We release two suites of models:

  • Checkpoints of improved accuracy (CER) for the CTC and LLM-ASR models compared to our existing LLM-ASR model (omniASR_{CTC,LLM}_{300M,1B,3B,7B}_v2).
  • A new variant of the LLM-ASR model that supports decoding on unlimited audio length (omniASR_LLM_Unlimited_{300M,1B,3B,7B}_v2). The unlimited audio length models are briefly described in the architecture overview section. It's accuracy is comparable to limited audio length models, however finetuning recipies for this model are currently not supported.

Documentation

Quick Start

Models & Architecture

Training & Data Pipeline

  • Data Preparation - End-to-end guide for multilingual dataset preparation, HuggingFace integration, and parquet processing
  • Training Recipes - Pre-configured workflows for CTC and LLM model training

Installation

The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).

# using pip
pip install omnilingual-asr

# using uv
uv add omnilingual-asr

Inference

from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

pipeline = ASRInferencePipeline(model_card="omniASR_LLM_Unlimited_7B_v2")
audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)

More details on running specific models can be found in the src/omnilingual_asr/models/inference directory.

⚠️ Important: Currently only audio files shorter than 40 seconds are accepted for inference on CTC and LLM model suites.

Supported Languages

To view the full list of 1600+ supported languages, you can access the language list programmatically:

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs

# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)

# Check if a specific language is supported
if "eng_Latn" in supported_langs:
    print("English (Latin script) is supported!")

Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...

Using the HuggingFace Dataset 🤗

We provide a large-scale multilingual speech dataset on HuggingFace under CC-BY-4.0 License: facebook/omnilingual-asr-corpus. This dataset can be directly used with our inference pipeline for evaluation or testing:

pip install "omnilingual-asr[data]"
from datasets import load_dataset
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

# Load dataset for a specific language (e.g., Ligurian)
omni_dataset = load_dataset("facebook/omnilingual-asr-corpus", "lij_Latn", split="train", streaming=True)
batch = next(omni_dataset.iter(5))

# Convert to pipeline input format
audio_data = [{"waveform": x["array"], "sample_rate": x["sampling_rate"]}
              for x in batch["audio"]]

# Run inference
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B_v2")
transcriptions = pipeline.transcribe(audio_data, batch_size=2)

# Display results
for i, (transcription, original_text) in enumerate(zip(transcriptions, batch["raw_text"]), 1):
    print(f"\n Sample {i}:")
    print(f"   Ground Truth: {original_text}")
    print(f"   Predicted:    {transcription}")

Model Architectures

Model Name Features Parameters Download Size (FP32) Inference VRAM¹ Real-Time Factor¹ (relative speed)²
omniASR_W2V_300M SSL 317_390_592 1.2 GiB
omniASR_W2V_1B SSL 965_514_752 3.6 GiB
omniASR_W2V_3B SSL 3_064_124_672 12.0 GiB
omniASR_W2V_7B SSL 6_488_487_168 25.0 GiB
omniASR_CTC_300M ASR 325_494_996 1.3 GiB ~2 GiB 0.001 (96x)
omniASR_CTC_1B ASR 975_065_300 3.7 GiB ~3 GiB 0.002 (48x)
omniASR_CTC_3B ASR 3_080_423_636 12.0 GiB ~8 GiB 0.003 (32x)
omniASR_CTC_7B ASR 6_504_786_132 25.0 GiB ~15 GiB 0.006 (16x)
omniASR_CTC_300M_v2 ASR 325_494_996 1.3 GiB ~2 GiB 0.001 (96x)
omniASR_CTC_1B_v2 ASR 975_065_300 3.7 GiB ~3 GiB 0.002 (48x)
omniASR_CTC_3B_v2 ASR 3_080_423_636 12.0 GiB ~8 GiB 0.003 (32x)
omniASR_CTC_7B_v2 ASR 6_504_786_132 25.0 GiB ~15 GiB 0.006 (16x)
omniASR_LLM_300M ASR with optional language conditioning 1_627_603_584 6.1 GiB ~5 GiB 0.090 (~1x)
omniASR_LLM_1B ASR with optional language conditioning 2_275_710_592 8.5 GiB ~6 GiB 0.091 (~1x)
omniASR_LLM_3B ASR with optional language conditioning 4_376_679_040 17.0 GiB ~10 GiB 0.093 (~1x)
omniASR_LLM_7B ASR with optional language conditioning 7_801_041_536 30.0 GiB ~17 GiB 0.092 (~1x)
omniASR_LLM_300M_v2 ASR with optional language conditioning 1_627_603_584 6.1 GiB ~5 GiB 0.090 (~1x)
omniASR_LLM_1B_v2 ASR with optional language conditioning 2_275_710_592 8.5 GiB ~6 GiB 0.091 (~1x)
omniASR_LLM_3B_v2 ASR with optional language conditioning 4_376_679_040 17.0 GiB ~10 GiB 0.093 (~1x)
omniASR_LLM_7B_v2 ASR with optional language conditioning 7_801_041_536 30.0 GiB ~17 GiB 0.092 (~1x)
omniASR_LLM_Unlimited_300M_v2 omniASR_LLM_300M + unlimited audio length 1_627_603_584 6.1 GiB ~5 GiB 0.092 (~1x) (0.206)³
omniASR_LLM_Unlimited_1B_v2 omniASR_LLM_1B + unlimited audio length 2_275_710_592 8.5 GiB ~6 GiB 0.097 (~1x) (0.207)³
omniASR_LLM_Unlimited_3B_v2 omniASR_LLM_3B + unlimited audio length 4_376_679_040 17.0 GiB ~10 GiB 0.095 (~1x) (0.208)³
omniASR_LLM_Unlimited_7B_v2 omniASR_LLM_7B + unlimited audio length 7_801_041_536 30.0 GiB ~17 GiB 0.097 (~1x) (0.208)³
omniASR_LLM_7B_ZS Zero-Shot ASR 7_810_900_608 30.0 GiB ~20 GiB  0.194 (~0.5x)
omniASR_tokenizer_v1 Tokenizer for all non-v2 models except omniASR_LLM_7B - 100 KiB -
omniASR_tokenizer_v1_variant7 Tokenizer for the omniASR_LLM_7B architecture - 100 KiB -
omniASR_tokenizer_written_v2 Tokenizer for all v2 architectures - 100 KiB -

¹ (batch=1, audio_len=30s, BF16, A100)

² Relative speed to omniASR_LLM_7B

³ (batch=1, audio_len=15min, BF16, A100)

Model Download & Storage

  • Automatic Download: Models are automatically downloaded on first use during training or inference
  • Storage Location: Models are saved to ~/.cache/fairseq2/assets/

Architecture Documentation

We provide a high-level model architecture overview in the model directory (src/omnilingual_asr/models), with individual configurations for each model family in the respective directories:

Training

To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.

License

Omnilingual ASR code and models are released under the Apache 2.0.

Citation

If you use the omnilingual ASR model suite in your research and wish to cite us, please use the following BibTeX entry!

@misc{omnilingualasrteam2025omnilingualasropensourcemultilingual,
      title={Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages},
      author={Omnilingual ASR team and Gil Keren and Artyom Kozhevnikov and Yen Meng and Christophe Ropers and Matthew Setzler and Skyler Wang and Ife Adebara and Michael Auli and Can Balioglu and Kevin Chan and Chierh Cheng and Joe Chuang and Caley Droof and Mark Duppenthaler and Paul-Ambroise Duquenne and Alexander Erben and Cynthia Gao and Gabriel Mejia Gonzalez and Kehan Lyu and Sagar Miglani and Vineel Pratap and Kaushik Ram Sadagopan and Safiyyah Saleem and Arina Turkatenko and Albert Ventayol-Boada and Zheng-Xin Yong and Yu-An Chung and Jean Maillard and Rashel Moritz and Alexandre Mourachko and Mary Williamson and Shireen Yates},
      year={2025},
      eprint={2511.09690},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.09690},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnilingual_asr-0.2.0.tar.gz (76.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnilingual_asr-0.2.0-py3-none-any.whl (86.0 kB view details)

Uploaded Python 3

File details

Details for the file omnilingual_asr-0.2.0.tar.gz.

File metadata

  • Download URL: omnilingual_asr-0.2.0.tar.gz
  • Upload date:
  • Size: 76.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for omnilingual_asr-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f3e28610570a987831fb6671623f4dad7a0d7d291aedd41b16242e256ae24cc6
MD5 b734dfcfb8092f334df254c7b7a781d1
BLAKE2b-256 58bafa443fd8d842b5392e68c126c92fec3e4069f8151ba1909fa0a3fbc3ac44

See more details on using hashes here.

File details

Details for the file omnilingual_asr-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for omnilingual_asr-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b8e811143603463c371c23464ff1946a52f876e6b6a62c5fb3deee6e39ab6d4
MD5 431e7e4693881ca3e7e5caffa2731006
BLAKE2b-256 9958802325b5845bd9ae248e114c422e50eb3ecdf78beca5e397a09f6f9ddbf6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page