Pure-PyTorch Parakeet TDT inference — no NeMo required

These details have not been verified by PyPI

Project links

Project description

nano-parakeet

Pure-PyTorch inference for NVIDIA Parakeet TDT — no NeMo required.

from nano_parakeet import from_pretrained
model = from_pretrained()
print(model.transcribe("audio.wav"))

Why?

The official NeMo inference stack pulls in ~180 packages — PyTorch Lightning, Hydra, OmegaConf, apex, distributed training scaffolding — none of which are needed at inference time. This makes it painful to integrate Parakeet into existing projects: version conflicts, long installs, and a 30-second cold-start on every process launch.

nano-parakeet reimplements the full inference pipeline in plain PyTorch. The only dependencies are things you probably already have:

	nano-parakeet	NeMo
Dependencies	5 (torch, numpy, soundfile, sentencepiece, huggingface-hub)	~180
Cold start	~3s (weights only)	~30s (framework init + CUDA kernel compile)
Warm RTF (Jetson AGX Orin)	93×	73×

Transcriptions are byte-identical to NeMo's output.

Install

pip install nano-parakeet

Requires Python 3.10+, PyTorch with CUDA, and ffmpeg.

Usage

Python API

from nano_parakeet import from_pretrained

model = from_pretrained()                    # downloads ~1.1GB on first run
text = model.transcribe("audio.wav")        # path, numpy array, or tensor
print(text)

CLI

nano-parakeet audio.wav
# or
python -m nano_parakeet audio.wav

Accepts OGG, WAV, M4A, or any format ffmpeg can read.

Benchmark

RTF > 1.0 = faster than real-time. 5 timed runs after a warm-up; best time reported.

Warm throughput

GPU	Audio	NeMo RTF	nano-parakeet RTF	Speedup
RTX 4090	12s	~207×	~519×	2.5×
Jetson AGX Orin 64GB	12s	~73×	~92×	1.3×

Note (RTX 4090): NeMo is run with strategy='greedy' (single-item, not batch). The default greedy_batch strategy uses TDT label-looping CUDA graphs that fail to compile on NeMo 2.6.2 + cuda-python 12.9 (NVRTC is not permitted inside a graph capture context). strategy='greedy' uses a different CUDA graph path that works fine.

Cold start (first inference, including framework load)

GPU	NeMo	nano-parakeet
RTX 4090	~30s	~3s
Jetson AGX Orin 64GB	~30s	~3s

Run both yourself:

git clone https://github.com/andimarafioti/nano-parakeet
cd parakeet-stt
./benchmark.sh sample.wav

How It Works

The full pipeline in plain PyTorch — no NeMo at runtime:

Audio (16 kHz, mono)
  │
  ▼  pre-emphasis (α=0.97) → STFT (n_fft=512, hop=160, win=400)
     → Mel filterbank (128 bins) → log → per-feature normalisation
  │
  ▼  FastConformer Encoder  (24 layers, d_model=1024, 8 heads)
     └─ ConvSubsampling (3× stride-2 → 8× time reduction)
     └─ RelPositionalEncoding (Transformer-XL style)
     └─ 24 × FastConformerLayer:
           FF₁ (×0.5) → Self-Attn (rel-pos) → Conv (k=9) → FF₂ (×0.5) → LN
  │
  ▼  TDT Decoder
     └─ RNNT Prediction: Embed(8193, 640) + 2-layer LSTM(640)
     └─ Joint: Linear(1024→640) + Linear(640→640) → ReLU → Linear(640→8198)
     └─ TDT greedy decode (durations [0,1,2,3,4], blank_id=8192)
  │
  ▼  SentencePiece decode → text

Weights are loaded directly from the .nemo file (a ZIP archive) without importing any NeMo module.

Optimisations

	Encoder	Decoder	Effect
fp16 autocast	✓	✗	tensor cores for 1024→4096→1024 FFN matmuls × 24 layers
CUDA graph	✗	✓	~20 kernel launches per decode step → 1 graph replay

Jetson Setup

The PyPI wheel works on standard x86 CUDA machines. For Jetson (JetPack 6), PyTorch needs to be installed from NVIDIA's distribution first:

# Install CUDA-enabled PyTorch for JetPack 6
UV_SKIP_WHEEL_FILENAME_CHECK=1 uv pip install \
  https://developer.download.nvidia.com/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl

# Then install nano-parakeet (skipping torch since it's already installed)
pip install nano-parakeet --no-deps
pip install numpy soundfile sentencepiece huggingface-hub

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Mar 10, 2026

This version

0.2.0

Feb 23, 2026

0.1.0

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_parakeet-0.2.0.tar.gz (209.5 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nano_parakeet-0.2.0-py3-none-any.whl (204.6 kB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file nano_parakeet-0.2.0.tar.gz.

File metadata

Download URL: nano_parakeet-0.2.0.tar.gz
Upload date: Feb 23, 2026
Size: 209.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for nano_parakeet-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ca03cf5e328e8e23f92a9b1298b645ab4ce4d4382981005aef5e9c890783f000`
MD5	`f06797408dc5516c336699545e939a85`
BLAKE2b-256	`77a30af02e59ee369bf1c08586b134439b59c1f74ec5ecb878193bba7d627542`

See more details on using hashes here.

File details

Details for the file nano_parakeet-0.2.0-py3-none-any.whl.

File metadata

Download URL: nano_parakeet-0.2.0-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 204.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.12

File hashes

Hashes for nano_parakeet-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3dc7c1834c76f60482678d025cef3fef786c2090f2a92cf206517d99385b1c18`
MD5	`b3f2acdd132c8d6ed8c8d4727f7dcbfb`
BLAKE2b-256	`841f5a9fe27245ab265bc4eb89e4fde592f19b276c9b2bc9fa9b40c08739615a`

See more details on using hashes here.

nano_parakeet 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

nano-parakeet

Why?

Install

Usage

Python API

CLI

Benchmark

Warm throughput

Cold start (first inference, including framework load)

How It Works

Optimisations

Jetson Setup

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes