Automatic Speech Recognition in Python using ONNX models

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

ONNX ASR

onnx-asr is a Python package for Automatic Speech Recognition using ONNX models. It's written in pure Python with minimal dependencies (no PyTorch, Transformers, or FFmpeg required):

[!TIP] Supports Parakeet v2 (En) / v3 (Multilingual), Canary v2 (Multilingual) and GigaAM v2/v3 (Ru) models!

The onnx-asr package supports many modern ASR models and the following features:

Runs on Windows, Linux, and MacOS on a variety of devices, from IoT devices with Arm CPUs to servers with Nvidia GPUs (benchmarks)
Loading models from hugging face or local folders (including quantized versions)
Accepts wav files or NumPy arrays (built-in support for file reading and resampling)
Batch processing
(experimental) Longform recognition with VAD (Voice Activity Detection)
(experimental) Returns token timestamps
Simple CLI
Online demo in HF Spaces

Supported models architectures

The package supports the following modern ASR model architectures (comparison with original implementations):

Nvidia NeMo Conformer/FastConformer/Parakeet/Canary (with CTC, RNN-T, TDT and Transformer decoders)
Kaldi Icefall Zipformer (with stateless RNN-T decoder) including Alpha Cephei Vosk 0.52+
Sber GigaAM v2/v3 (with CTC and RNN-T decoders, including E2E versions)
T-Tech T-one (with CTC decoder, no streaming support yet)
OpenAI Whisper

When saving these models in onnx format, usually only the encoder and decoder are saved. To run them, the corresponding preprocessor and decoding must be implemented. Therefore, the package contains these implementations for all supported models:

Log-mel spectrogram preprocessors
Greedy search decoding

Installation

The package can be installed from PyPI:

With CPU onnxruntime and huggingface-hub

pip install onnx-asr[cpu,hub]

With GPU onnxruntime and huggingface-hub

[!IMPORTANT] First, you need to install the required version of CUDA.

pip install onnx-asr[gpu,hub]

Without onnxruntime and huggingface-hub (if you already have some version of onnxruntime installed and prefer to download the models yourself)

pip install onnx-asr

To build onnx-asr from source, you need to install pdm. Then you can build onnx-asr with command:

pdm build

Usage examples

Load ONNX model from Hugging Face

Load ONNX model from Hugging Face and recognize wav file:

import onnx_asr
model = onnx_asr.load_model("gigaam-v2-rnnt")
print(model.recognize("test.wav"))

[!IMPORTANT] Supported wav file formats: PCM_U8, PCM_16, PCM_24 and PCM_32 formats. For other formats, you either need to convert them first, or use a library that can read them into a numpy array.

Supported model names:

gigaam-v2-ctc for Sber GigaAM v2 CTC (origin, onnx)
gigaam-v2-rnnt for Sber GigaAM v2 RNN-T (origin, onnx)
gigaam-v3-ctc for Sber GigaAM v3 CTC (origin, onnx)
gigaam-v3-rnnt for Sber GigaAM v3 RNN-T (origin, onnx)
gigaam-v3-e2e-ctc for Sber GigaAM v3 E2E CTC (origin, onnx)
gigaam-v3-e2e-rnnt for Sber GigaAM v3 E2E RNN-T (origin, onnx)
nemo-fastconformer-ru-ctc for Nvidia FastConformer-Hybrid Large (ru) with CTC decoder (origin, onnx)
nemo-fastconformer-ru-rnnt for Nvidia FastConformer-Hybrid Large (ru) with RNN-T decoder (origin, onnx)
nemo-parakeet-ctc-0.6b for Nvidia Parakeet CTC 0.6B (en) (origin, onnx)
nemo-parakeet-rnnt-0.6b for Nvidia Parakeet RNNT 0.6B (en) (origin, onnx)
nemo-parakeet-tdt-0.6b-v2 for Nvidia Parakeet TDT 0.6B V2 (en) (origin, onnx)
nemo-parakeet-tdt-0.6b-v3 for Nvidia Parakeet TDT 0.6B V3 (multilingual) (origin, onnx)
nemo-canary-1b-v2 for Nvidia Canary 1B V2 (multilingual) (origin, onnx)
whisper-base for OpenAI Whisper Base exported with onnxruntime (origin, onnx)
alphacep/vosk-model-ru for Alpha Cephei Vosk 0.54-ru (origin)
alphacep/vosk-model-small-ru for Alpha Cephei Vosk 0.52-small-ru (origin)
t-tech/t-one for T-Tech T-one (origin)
onnx-community/whisper-tiny, onnx-community/whisper-base, onnx-community/whisper-small, onnx-community/whisper-large-v3-turbo, etc. for OpenAI Whisper exported with Hugging Face optimum (onnx-community)

[!IMPORTANT] Some long-ago converted onnx-community models have a broken fp16 precision version.

[!IMPORTANT] Canary models do not work with the CoreML provider.

Example with soundfile:

import onnx_asr
import soundfile as sf

model = onnx_asr.load_model("whisper-base")

waveform, sample_rate = sf.read("test.wav", dtype="float32")
model.recognize(waveform, sample_rate=sample_rate)

Batch processing is also supported:

import onnx_asr
model = onnx_asr.load_model("nemo-fastconformer-ru-ctc")
print(model.recognize(["test1.wav", "test2.wav", "test3.wav", "test4.wav"]))

Some models have a quantized versions:

import onnx_asr
model = onnx_asr.load_model("alphacep/vosk-model-ru", quantization="int8")
print(model.recognize("test.wav"))

Return tokens and timestamps:

import onnx_asr
model = onnx_asr.load_model("alphacep/vosk-model-ru").with_timestamps()
print(model.recognize("test1.wav"))

VAD

Load VAD ONNX model from Hugging Face and recognize wav file:

import onnx_asr
vad = onnx_asr.load_vad("silero")
model = onnx_asr.load_model("gigaam-v2-rnnt").with_vad(vad)
for res in model.recognize("test.wav"):
    print(res)

[!NOTE]
You will most likely need to adjust VAD parameters to get the correct results.

Supported VAD names:

silero for Silero VAD (origin, onnx)

CLI

Package has simple CLI interface

onnx-asr nemo-fastconformer-ru-ctc test.wav

For full usage parameters, see help:

onnx-asr -h

Gradio

Create simple web interface with Gradio:

import onnx_asr
import gradio as gr

model = onnx_asr.load_model("gigaam-v2-rnnt")

def recognize(audio):
    if audio:
        sample_rate, waveform = audio
        waveform = waveform / 2**15
        if waveform.ndim == 2:
            waveform = waveform.mean(axis=1)
        return model.recognize(waveform, sample_rate=sample_rate)

demo = gr.Interface(fn=recognize, inputs=gr.Audio(min_length=1, max_length=30), outputs="text")
demo.launch()

Load ONNX model from local directory

Load ONNX model from local directory and recognize wav file:

import onnx_asr
model = onnx_asr.load_model("gigaam-v2-ctc", "models/gigaam-onnx")
print(model.recognize("test.wav"))

Supported model types:

All models from supported model names
nemo-conformer-ctc for NeMo Conformer/FastConformer/Parakeet with CTC decoder
nemo-conformer-rnnt for NeMo Conformer/FastConformer/Parakeet with RNN-T decoder
nemo-conformer-tdt for NeMo Conformer/FastConformer/Parakeet with TDT decoder
nemo-conformer-aed for NeMo Canary with Transformer decoder
kaldi-rnnt or vosk for Kaldi Icefall Zipformer with stateless RNN-T decoder
whisper-ort for Whisper (exported with onnxruntime)
whisper for Whisper (exported with optimum)

Comparison with original implementations

Packages with original implementations:

gigaam for GigaAM models (github)
nemo-toolkit for NeMo models (github)
openai-whisper for Whisper models (github)
sherpa-onnx for Vosk models (github, docs)
T-one for T-Tech T-one model (github)

Hardware:

CPU tests were run on a laptop with an Intel i7-7700HQ processor.
GPU tests were run in Google Colab on Nvidia T4

Tests of Russian ASR models were performed on a test subset of the Russian LibriSpeech dataset.

Model	Package / decoding	CER	WER	RTFx (CPU)	RTFx (GPU)
GigaAM v2 CTC	default	1.06%	5.23%	7.2	44.2
GigaAM v2 CTC	onnx-asr	1.06%	5.23%	11.6	64.3
GigaAM v2 RNN-T	default	1.10%	5.22%	5.5	23.3
GigaAM v2 RNN-T	onnx-asr	1.10%	5.22%	10.7	38.7
GigaAM v3 CTC	default	0.98%	4.72%	12.2	73.3
GigaAM v3 CTC	onnx-asr	0.98%	4.72%	14.5	68.3
GigaAM v3 RNN-T	default	0.93%	4.39%	8.2	41.6
GigaAM v3 RNN-T	onnx-asr	0.93%	4.39%	13.3	39.9
GigaAM v3 E2E CTC	default	1.50%	7.10%	N/A	178.0
GigaAM v3 E2E CTC	onnx-asr	1.56%	7.80%	N/A	65.6
GigaAM v3 E2E RNN-T	default	1.61%	6.94%	N/A	47.6
GigaAM v3 E2E RNN-T	onnx-asr	1.67%	7.60%	N/A	42.8
Nemo FastConformer CTC	default	3.11%	13.12%	29.1	143.0
Nemo FastConformer CTC	onnx-asr	3.11%	13.12%	45.8	103.3
Nemo FastConformer RNN-T	default	2.63%	11.62%	17.4	111.6
Nemo FastConformer RNN-T	onnx-asr	2.63%	11.62%	27.2	53.4
Nemo Parakeet TDT 0.6B V3	default	2.34%	10.95%	5.6	75.4
Nemo Parakeet TDT 0.6B V3	onnx-asr	2.38%	10.95%	9.7	59.7
Nemo Canary 1B V2	default	4.89%	20.00%	N/A	14.0
Nemo Canary 1B V2	onnx-asr	5.00%	20.03%	N/A	17.4
T-Tech T-one	default	1.28%	6.56%	11.9	N/A
T-Tech T-one	onnx-asr	1.28%	6.57%	11.7	16.5
Vosk 0.52 small	greedy_search	3.64%	14.53%	48.2	71.4
Vosk 0.52 small	modified_beam_search	3.50%	14.25%	29.0	24.7
Vosk 0.52 small	onnx-asr	3.64%	14.53%	45.5	75.2
Vosk 0.54	greedy_search	2.21%	9.89%	34.8	64.2
Vosk 0.54	modified_beam_search	2.21%	9.85%	23.9	24
Vosk 0.54	onnx-asr	2.21%	9.89%	33.6	69.6
Whisper base	default	10.61%	38.89%	5.4	17.3
Whisper base	onnx-asr*	10.64%	38.33%	6.6	20.1
Whisper large-v3-turbo	default	2.96%	10.27%	N/A	13.6
Whisper large-v3-turbo	onnx-asr**	2.63%	10.13%	N/A	12.4

Tests of English ASR models were performed on a test subset of the Voxpopuli dataset.

Model	Package / decoding	CER	WER	RTFx (CPU)	RTFx (GPU)
Nemo Parakeet CTC 0.6B	default	4.09%	7.20%	8.3	107.7
Nemo Parakeet CTC 0.6B	onnx-asr	4.09%	7.20%	11.5	89.0
Nemo Parakeet RNN-T 0.6B	default	3.64%	6.32%	6.7	85.0
Nemo Parakeet RNN-T 0.6B	onnx-asr	3.64%	6.32%	8.7	48.0
Nemo Parakeet TDT 0.6B V2	default	3.88%	6.52%	6.5	87.6
Nemo Parakeet TDT 0.6B V2	onnx-asr	3.88%	6.52%	10.5	70.1
Nemo Parakeet TDT 0.6B V3	default	3.97%	6.76%	6.1	90.0
Nemo Parakeet TDT 0.6B V3	onnx-asr	3.97%	6.75%	9.5	68.2
Nemo Canary 1B V2	default	4.62%	7.42%	N/A	17.5
Nemo Canary 1B V2	onnx-asr	4.67%	7.47%	N/A	20.8
Whisper base	default	7.81%	13.24%	8.4	27.7
Whisper base	onnx-asr*	7.52%	12.76%	9.2	28.9
Whisper large-v3-turbo	default	6.85%	11.16%	N/A	20.4
Whisper large-v3-turbo	onnx-asr**	10.31%	14.65%	N/A	17.9

[!NOTE]

* whisper-ort model (model types).

** whisper model (model types) with fp16 precision.

All other models were run with the default precision - fp32 on CPU and fp32 or fp16 (some of the original models) on GPU.

Benchmarks

Hardware:

Arm tests were run on an Orange Pi Zero 3 with a Cortex-A53 processor.
x64 tests were run on a laptop with an Intel i7-7700HQ processor.
T4 tests were run in Google Colab on Nvidia T4

Russian ASR models

Notebook with benchmark code - benchmark-ru

Model	RTFx (Arm)	RTFx (x64)	RTFx (T4)
GigaAM v2 CTC	0.8	11.6	64.3
GigaAM v2 RNN-T	0.8	10.7	38.7
GigaAM v3 CTC	N/A	14.5	68.3
GigaAM v3 RNN-T	N/A	13.3	39.9
Nemo FastConformer CTC	4.0	45.8	103.3
Nemo FastConformer RNN-T	3.2	27.2	53.4
Nemo Parakeet TDT 0.6B V3	N/A	9.7	59.7
Nemo Canary 1B V2	N/A	N/A	17.4
T-Tech T-one	N/A	11.7	16.5
Vosk 0.52 small	5.1	45.5	75.2
Vosk 0.54	3.8	33.6	69.6
Whisper base	0.8	6.6	20.1
Whisper large-v3-turbo	N/A	N/A	12.4

English ASR models

Notebook with benchmark code - benchmark-en

Model	RTFx (Arm)	RTFx (x64)	RTFx (T4)
Nemo Parakeet CTC 0.6B	1.1	11.5	89.0
Nemo Parakeet RNN-T 0.6B	1.0	8.7	48.0
Nemo Parakeet TDT 0.6B V2	1.1	10.5	70.1
Nemo Parakeet TDT 0.6B V3	N/A	9.5	68.2
Nemo Canary 1B V2	N/A	N/A	20.8
Whisper base	1.2	9.2	28.9
Whisper large-v3-turbo	N/A	N/A	17.9

Convert model to ONNX

Save the model according to the instructions below and add config.json:

{
    "model_type": "nemo-conformer-rnnt", // See "Supported model types"
    "features_size": 80, // Size of preprocessor features for Whisper or Nemo models, supported 80 and 128
    "subsampling_factor": 8, // Subsampling factor - 4 for conformer models and 8 for fastconformer and parakeet models
    "max_tokens_per_step": 10 // Max tokens per step for RNN-T decoder
}

Then you can upload the model into Hugging Face and use load_model to download it.

Nvidia NeMo Conformer/FastConformer/Parakeet

Install NeMo Toolkit

pip install nemo_toolkit['asr']

Download model and export to ONNX format

import nemo.collections.asr as nemo_asr
from pathlib import Path

model = nemo_asr.models.ASRModel.from_pretrained("nvidia/stt_ru_fastconformer_hybrid_large_pc")

# For export Hybrid models with CTC decoder
# model.set_export_config({"decoder_type": "ctc"})

onnx_dir = Path("nemo-onnx")
onnx_dir.mkdir(exist_ok=True)
model.export(str(Path(onnx_dir, "model.onnx")))

with Path(onnx_dir, "vocab.txt").open("wt") as f:
    for i, token in enumerate([*model.tokenizer.vocab, "<blk>"]):
        f.write(f"{token} {i}\n")

Sber GigaAM v2/v3

Install GigaAM

git clone https://github.com/salute-developers/GigaAM.git
pip install ./GigaAM --extra-index-url https://download.pytorch.org/whl/cpu

Download model and export to ONNX format

import gigaam
from pathlib import Path

onnx_dir = "gigaam-onnx"
model_type = "rnnt"  # or "ctc"

model = gigaam.load_model(
    model_type,
    fp16_encoder=False,  # only fp32 tensors
    use_flash=False,  # disable flash attention
)
model.to_onnx(dir_path=onnx_dir)

with Path(onnx_dir, "v2_vocab.txt").open("wt") as f:
    for i, token in enumerate(["\u2581", *(chr(ord("а") + i) for i in range(32)), "<blk>"]):
        f.write(f"{token} {i}\n")

OpenAI Whisper (with `onnxruntime` export)

Read onnxruntime instruction for convert Whisper to ONNX.

Download model and export with Beam Search and Forced Decoder Input Ids:

python3 -m onnxruntime.transformers.models.whisper.convert_to_onnx -m openai/whisper-base --output ./whisper-onnx --use_forced_decoder_ids --optimize_onnx --precision fp32

Save tokenizer config

from transformers import WhisperTokenizer

processor = WhisperTokenizer.from_pretrained("openai/whisper-base")
processor.save_pretrained("whisper-onnx")

OpenAI Whisper (with `optimum` export)

Export model to ONNX with Hugging Face optimum-cli

optimum-cli export onnx --model openai/whisper-base ./whisper-onnx

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

istupakov

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.11.0

Mar 23, 2026

0.10.2

Jan 18, 2026

0.10.1

Dec 30, 2025

0.10.0

Dec 26, 2025

0.9.1

Dec 8, 2025

This version

0.9.0

Dec 4, 2025

0.8.0

Nov 27, 2025

0.7.0

Aug 16, 2025

0.6.1

May 24, 2025

0.6.0

May 10, 2025

0.5.0

May 6, 2025

0.4.2

May 1, 2025

0.4.1

Apr 30, 2025

0.4.0

Apr 30, 2025

0.3.3

Apr 29, 2025

0.3.2

Apr 27, 2025

0.3.1

Apr 24, 2025

0.3.0

Apr 24, 2025

0.2.0

Apr 22, 2025

0.1.0

Apr 21, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onnx_asr-0.9.0.tar.gz (97.4 kB view details)

Uploaded Dec 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

onnx_asr-0.9.0-py3-none-any.whl (92.5 kB view details)

Uploaded Dec 4, 2025 Python 3

File details

Details for the file onnx_asr-0.9.0.tar.gz.

File metadata

Download URL: onnx_asr-0.9.0.tar.gz
Upload date: Dec 4, 2025
Size: 97.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: pdm/2.26.2 CPython/3.12.12 Linux/6.11.0-1018-azure

File hashes

Hashes for onnx_asr-0.9.0.tar.gz
Algorithm	Hash digest
SHA256	`312b53fd6909fb9fb6b99c1990600e2aac2a12b93c962f6ddb1b5ead3ed30a58`
MD5	`10bd4147cea7d84362520d596e66e5bc`
BLAKE2b-256	`790b39297796333eaac4c60a8759716c23f2944c0a5453172c0bed2718eb4c98`

See more details on using hashes here.

File details

Details for the file onnx_asr-0.9.0-py3-none-any.whl.

File metadata

Download URL: onnx_asr-0.9.0-py3-none-any.whl
Upload date: Dec 4, 2025
Size: 92.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: pdm/2.26.2 CPython/3.12.12 Linux/6.11.0-1018-azure

File hashes

Hashes for onnx_asr-0.9.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca91226f3386179868a687702d3483534f576c497feb76afafff331844753b6e`
MD5	`ac0f9d5d1947880fbd698c805170b07b`
BLAKE2b-256	`07c4b4aa81e53a66dd842111f22a8ce48265b3087e38ea9fee7881baa4d007d0`

See more details on using hashes here.

onnx-asr 0.9.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

ONNX ASR

Supported models architectures

Installation

Usage examples

Load ONNX model from Hugging Face

Supported model names:

VAD

Supported VAD names:

CLI

Gradio

Load ONNX model from local directory

Supported model types:

Comparison with original implementations

Benchmarks

Russian ASR models

English ASR models

Convert model to ONNX

Nvidia NeMo Conformer/FastConformer/Parakeet

Sber GigaAM v2/v3

OpenAI Whisper (with onnxruntime export)

OpenAI Whisper (with optimum export)

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

OpenAI Whisper (with `onnxruntime` export)

OpenAI Whisper (with `optimum` export)