Skip to main content

Omnilingual ASR Modeling Library

Project description

Header image with a collage of on-the-ground photos from the transcription gathering efforts in Pakistan and Liberia.

Photographs captured during corpus creation efforts in Pakistan and Liberia.

Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages

Omnilingual ASR is an open-source speech recognition system supporting over 1,600 languages — including hundreds never previously covered by any ASR technology. Designed for broad accessibility, it enables new languages to be added with just a few paired examples without requiring specialized expertise or large datasets. By combining scalable zero-shot learning with a flexible model family, Omnilingual ASR aims to make speech technology more inclusive and adaptable for communities and researchers worldwide.

Performance results table

Our 7B-LLM-ASR system achieves state-of-the-art performance across 1,600+ languages, with character error rates (CER) below 10 for 78% of those languages.

Documentation

Quick Start

Models & Architecture

Training & Data Pipeline

  • Data Preparation - End-to-end guide for multilingual dataset preparation, HuggingFace integration, and parquet processing
  • Training Recipes - Pre-configured workflows for CTC and LLM model training

Installation

The models were developed using fairseq2, a research-focused sequence modeling toolkit. While we provide a reference inference pipeline that works across platforms, audio support requires libsndfile (Mac: brew install libsndfile; Windows may need an additional setup).

# using pip
pip install omnilingual-asr

# using uv
uv add omnilingual-asr

Inference

from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")

audio_files = ["/path/to/eng_audio1.flac", "/path/to/deu_audio2.wav"]
lang = ["eng_Latn", "deu_Latn"]
transcriptions = pipeline.transcribe(audio_files, lang=lang, batch_size=2)

More details on running specific models can be found in the src/omnilingual_asr/models/inference directory.

⚠️ Important: Currently only audio files shorter than 40 seconds are accepted for inference. We plan to add support for transcribing unlimited-length audio files shortly.

Supported Languages

To view the full list of 1600+ supported languages, you can access the language list programmatically:

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs

# Print all supported languages
print(f"Total supported languages: {len(supported_langs)}")
print(supported_langs)

# Check if a specific language is supported
if "eng_Latn" in supported_langs:
    print("English (Latin script) is supported!")

Languages follow the format {language_code}_{script}, for example eng_Latn - English (Latin script), cmn_Hans - Mandarin Chinese (Simplified), ...

Using the HuggingFace Dataset 🤗

We provide a large-scale multilingual speech dataset on HuggingFace under CC-BY-4.0 License: facebook/omnilingual-asr-corpus. This dataset can be directly used with our inference pipeline for evaluation or testing:

pip install "omnilingual-asr[data]"
from datasets import load_dataset
from omnilingual_asr.models.inference.pipeline import ASRInferencePipeline

# Load dataset for a specific language (e.g., Ligurian)
omni_dataset = load_dataset("facebook/omnilingual-asr-corpus", "lij_Latn", split="train", streaming=True)
batch = next(omni_dataset.iter(5))

# Convert to pipeline input format
audio_data = [{"waveform": x["array"], "sample_rate": x["sampling_rate"]}
              for x in batch["audio"]]

# Run inference
pipeline = ASRInferencePipeline(model_card="omniASR_LLM_7B")
transcriptions = pipeline.transcribe(audio_data, batch_size=2)

# Display results
for i, (transcription, original_text) in enumerate(zip(transcriptions, batch["raw_text"]), 1):
    print(f"\n Sample {i}:")
    print(f"   Ground Truth: {original_text}")
    print(f"   Predicted:    {transcription}")

Model Architectures

Model Name Features Parameters Download Size (FP32) Inference VRAM¹ Real-Time Factor¹ (relative speed)²
omniASR_W2V_300M SSL 317_390_592 1.2 GiB
omniASR_W2V_1B SSL 965_514_752 3.6 GiB
omniASR_W2V_3B SSL 3_064_124_672 12.0 GiB
omniASR_W2V_7B SSL 6_488_487_168 25.0 GiB
omniASR_CTC_300M ASR 325_494_996 1.3 GiB ~2 GiB 0.001 (96x)
omniASR_CTC_1B ASR 975_065_300 3.7 GiB ~3 GiB 0.002 (48x)
omniASR_CTC_3B ASR 3_080_423_636 12.0 GiB ~8 GiB 0.003 (32x)
omniASR_CTC_7B ASR 6_504_786_132 25.0 GiB ~15 GiB 0.006 (16x)
omniASR_LLM_300M ASR with optional language conditioning 1_627_603_584 6.1 GiB ~5 GiB 0.090 (~1x)
omniASR_LLM_1B ASR with optional language conditioning 2_275_710_592 8.5 GiB ~6 GiB 0.091 (~1x)
omniASR_LLM_3B ASR with optional language conditioning 4_376_679_040 17.0 GiB ~10 GiB 0.093 (~1x)
omniASR_LLM_7B ASR with optional language conditioning 7_801_041_536 30.0 GiB ~17 GiB 0.092 (~1x)
omniASR_LLM_7B_ZS Zero-Shot ASR 7_810_900_608 30.0 GiB ~20 GiB  0.194 (~0.5x)
omniASR_tokenizer Tokenizer for most of architectures (except omniASR_LLM_7B) - 100 KiB -
omniASR_tokenizer_v7 Tokenizer for omniASR_LLM_7B model - 100 KiB -

¹ (batch=1, audio_len=30s, BF16, A100)

² Relative speed to omniASR_LLM_7B

Model Download & Storage

  • Automatic Download: Models are automatically downloaded on first use during training or inference
  • Storage Location: Models are saved to ~/.cache/fairseq2/assets/

Architecture Documentation

We provide a high-level model architecture overview in the model directory (src/omnilingual_asr/models), with individual configurations for each model family in the respective directories:

Training

To further finetune the released checkpoints on your own data, use our data preparation guide followed by the finetuning recipe guide.

License

Omnilingual ASR code and models are released under the Apache 2.0.

Citation

If you use the omnilingual ASR model suite in your research and wish to cite us, please use the following BibTeX entry (arxiv version will be added soon)!

@misc{omnilingualasr2025,
    title={{Omnilingual ASR}: Open-Source Multilingual Speech Recognition for 1600+ Languages},
    author={{Omnilingual ASR Team} and Keren, Gil and Kozhevnikov, Artyom and Meng, Yen and Ropers, Christophe and Setzler, Matthew and Wang, Skyler and Adebara, Ife and Auli, Michael and Chan, Kevin and Cheng, Chierh and Chuang, Joe and Droof, Caley and Duppenthaler, Mark and Duquenne, Paul-Ambroise and Erben, Alexander and Gao, Cynthia and Mejia Gonzalez, Gabriel and Lyu, Kehan and Miglani, Sagar and Pratap, Vineel and Sadagopan, Kaushik Ram and Saleem, Safiyyah and Turkatenko, Arina and Ventayol-Boada, Albert and Yong, Zheng-Xin and Chung, Yu-An and Maillard, Jean and Moritz, Rashel and Mourachko, Alexandre and Williamson, Mary and Yates, Shireen},
    year={2025},
    url={https://ai.meta.com/research/publications/omnilingual-asr-open-source-multilingual-speech-recognition-for-1600-languages/},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnilingual_asr-0.1.0.tar.gz (72.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnilingual_asr-0.1.0-py3-none-any.whl (82.0 kB view details)

Uploaded Python 3

File details

Details for the file omnilingual_asr-0.1.0.tar.gz.

File metadata

  • Download URL: omnilingual_asr-0.1.0.tar.gz
  • Upload date:
  • Size: 72.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for omnilingual_asr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5e8a61a616d02a9a30e4f1b5e22d38c60018cc246f7eb160783fcdc7142a6ca6
MD5 5c5b7ddb66a61282c7fc240fb545e5ba
BLAKE2b-256 a0e3b33098e21a02e21b2cc593acb400c3b72a42758af081d1aa822092b7c0fc

See more details on using hashes here.

File details

Details for the file omnilingual_asr-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for omnilingual_asr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b1c9d840878934d7bf4d140aeb93d2365e7709083042e18efd8566f21906852
MD5 417fc01ebc9b990fe2150bb0b1f0fcff
BLAKE2b-256 a2c81a2c8946b32ed69b0a3ab56bdf52b624f853ddb92e7112f66b4e0d565a57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page