NVIDIA Nemotron Speech Streaming ASR on Apple Silicon via MLX

These details have not been verified by PyPI

Project links

Project description

nemotron-asr-mlx

NVIDIA Nemotron ASR on Apple Silicon. 94x realtime. Pure MLX.

93 minutes of audio transcribed in 59 seconds on an M-series Mac. No GPU drivers, no CUDA, no Docker. Just pip install and go.

This is a native MLX port of NVIDIA's Nemotron-ASR 0.6B — the cache-aware streaming conformer that processes each audio frame exactly once. No sliding windows, no recomputation, no rewinding. State lives in fixed-size ring buffers so latency stays flat no matter how long you talk.

pip install nemotron-asr-mlx

from nemotron_asr_mlx import from_pretrained

model = from_pretrained("dboris/nemotron-asr-mlx")
result = model.transcribe("meeting.wav")
print(result.text)

That's it. Model downloads on first run (~1.2 GB).

Benchmark

Tested on Apple Silicon. All times are wall-clock inference only (no I/O).

Content	Duration	Inference	Speed	Tokens
Short conversation	5s	0.09s	55x RT	35
Technical explainer	98s	1.04s	95x RT	474
Audiobook excerpt	9s	0.15s	58x RT	57
Long-form analysis	25.6 min	17.0s	91x RT	10,572
Lecture recording	36.1 min	23.5s	92x RT	14,688
Meeting recording	29.4 min	17.6s	101x RT	7,796
Total	93.0 min	59.3s	94x RT	33,622

618.5M parameters. 3.4 GB peak GPU memory. Model loads in 0.1s after first download.

Run your own:

python benchmark.py /path/to/audio/files

Why this exists

Most "streaming" ASR on Mac is either (a) Whisper with overlapping windows reprocessing the same audio over and over, or (b) cloud APIs adding network latency to every utterance. Nemotron's cache-aware conformer is architecturally different:

Each frame processed once — state carried forward in fixed-size ring buffers, not recomputed
Constant memory — no growing KV caches, no memory spikes on long recordings
Native Metal — no PyTorch, no ONNX, no bridge layers. Direct MLX on Apple GPU
94x realtime — an hour of audio in under a minute

The model achieves 2.43% WER on LibriSpeech test-clean, competitive with much larger models.

Install

pip install nemotron-asr-mlx

Python 3.10+ and an Apple Silicon Mac.

Usage

CLI

nemotron-asr transcribe meeting.wav          # batch transcribe a file
nemotron-asr listen                          # stream from microphone
nemotron-asr listen --chunk-ms 80            # lowest latency streaming

Python API

from nemotron_asr_mlx import from_pretrained

model = from_pretrained("dboris/nemotron-asr-mlx")

# Batch — transcribe a file or numpy array
result = model.transcribe("audio.wav")
print(result.text)
print(result.tokens)  # BPE token IDs

# Streaming — push audio chunks, get text back incrementally
session = model.create_stream(chunk_ms=160)
event = session.push(pcm_chunk)      # StreamEvent with text_delta
print(event.text_delta, end="")
final = session.flush()              # final result
session.reset()                      # reuse for next utterance

# Live mic streaming
with model.listen(chunk_ms=160) as stream:
    for event in stream:
        print(event.text_delta, end="", flush=True)

StreamEvent

Every push() and flush() returns a StreamEvent:

Field	Type	Description
`text_delta`	`str`	New text since last event
`text`	`str`	Full accumulated text
`is_final`	`bool`	True only from `flush()`
`tokens`	`list[int]`	All accumulated BPE token IDs

Architecture

FastConformer encoder (24 layers, 1024-dim) with 8x depthwise striding subsampling. RNNT decoder with 2-layer LSTM prediction network and joint network. Per-layer-group attention context windows [[70,13], [70,6], [70,1], [70,0]] for progressive causal restriction. Greedy decoding with blank suppression.

Based on Cache-aware Streaming Conformer and the NeMo toolkit.

Live Demo

A browser-based demo with live mic transcription:

pip install websockets
python demo/server.py

Open http://localhost:8765, click Record, and start speaking. Transcription updates in real-time with inference stats.

Weight conversion

If you have a .nemo checkpoint and want to convert it yourself:

pip install torch safetensors pyyaml  # conversion deps only
nemotron-asr convert model.nemo ./output_dir

Produces config.json + model.safetensors. Conversion deps are not needed for inference.

Dependencies

Deliberately minimal:

mlx — Apple's ML framework
huggingface-hub — model download
numpy — mel spectrogram
sounddevice — mic access (optional)
websockets — live demo server (optional)
typer — CLI

License

Apache 2.0

Author

Boris Djordjevic / 199 Biotechnologies

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Feb 28, 2026

This version

0.1.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nemotron_asr_mlx-0.1.0.tar.gz (36.0 kB view details)

Uploaded Feb 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nemotron_asr_mlx-0.1.0-py3-none-any.whl (35.0 kB view details)

Uploaded Feb 28, 2026 Python 3

File details

Details for the file nemotron_asr_mlx-0.1.0.tar.gz.

File metadata

Download URL: nemotron_asr_mlx-0.1.0.tar.gz
Upload date: Feb 28, 2026
Size: 36.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for nemotron_asr_mlx-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4a8187d12b5f664f8a272e2414dad859c5374ae832f90161c3cabe57a815dd89`
MD5	`111cf8e858fb183036454e8690a5b3e4`
BLAKE2b-256	`f323630f698ed788c057eb20e8e26e07de903c5b9b5cd6c104fc64c659ee721a`

See more details on using hashes here.

File details

Details for the file nemotron_asr_mlx-0.1.0-py3-none-any.whl.

File metadata

Download URL: nemotron_asr_mlx-0.1.0-py3-none-any.whl
Upload date: Feb 28, 2026
Size: 35.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for nemotron_asr_mlx-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4561cf42d2e372eda4a930b4568ce46852a48ed2029bde881378cdfb9fab7dbb`
MD5	`f2059734e23831adcbbc95cbe40974f1`
BLAKE2b-256	`5a99197518c2492290ff26bb7c7081fe4462d6ad16909b900fdb0cc2e45f452c`

See more details on using hashes here.

nemotron-asr-mlx 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nemotron-asr-mlx

Benchmark

Why this exists

Install

Usage

CLI

Python API

StreamEvent

Architecture

Live Demo

Weight conversion

Dependencies

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes