Skip to main content

Pure-PyTorch Parakeet TDT inference — no NeMo required

Project description

nano-parakeet

Pure-PyTorch inference for NVIDIA Parakeet TDT — no NeMo required.

from nano_parakeet import from_pretrained
model = from_pretrained()
print(model.transcribe("audio.wav"))

Why?

The official NeMo inference stack pulls in ~180 packages — PyTorch Lightning, Hydra, OmegaConf, apex, distributed training scaffolding — none of which are needed at inference time. This makes it painful to integrate Parakeet into existing projects: version conflicts, long installs, and a 30-second cold-start on every process launch.

nano-parakeet reimplements the full inference pipeline in plain PyTorch. The only dependencies are things you probably already have:

nano-parakeet NeMo
Dependencies 5 (torch, numpy, soundfile, sentencepiece, huggingface-hub) ~180
Cold start ~3s (weights only) ~30s (framework init + CUDA kernel compile)
Warm RTF (Jetson AGX Orin) 93× 73×

Transcriptions are byte-identical to NeMo's output.

Install

pip install nano-parakeet

Requires Python 3.10+, PyTorch with CUDA, and ffmpeg.

Usage

Python API

from nano_parakeet import from_pretrained

model = from_pretrained()                    # downloads ~1.1GB on first run
text = model.transcribe("audio.wav")        # path, numpy array, or tensor
print(text)

CLI

nano-parakeet audio.wav
# or
python -m nano_parakeet audio.wav

Accepts OGG, WAV, M4A, or any format ffmpeg can read.

Benchmark

RTF > 1.0 = faster than real-time. 5 timed runs after a warm-up; best time reported.

Warm throughput

GPU Audio NeMo RTF nano-parakeet RTF Speedup
RTX 4090 12s ~207× ~519× 2.5×
Jetson AGX Orin 64GB 12s ~84× ~112× 1.3×

Note (RTX 4090): NeMo is run with strategy='greedy' (single-item, not batch). The default greedy_batch strategy uses TDT label-looping CUDA graphs that fail to compile on NeMo 2.6.2 + cuda-python 12.9 (NVRTC is not permitted inside a graph capture context). strategy='greedy' uses a different CUDA graph path that works fine.

Cold start (first inference, including framework load)

GPU NeMo nano-parakeet
RTX 4090 ~30s ~3s
Jetson AGX Orin 64GB ~30s ~3s

Run both yourself:

git clone https://github.com/andimarafioti/nano-parakeet
cd parakeet-stt
./benchmark.sh sample.wav

How It Works

The full pipeline in plain PyTorch — no NeMo at runtime:

Audio (16 kHz, mono)
  │
  ▼  pre-emphasis (α=0.97) → STFT (n_fft=512, hop=160, win=400)
     → Mel filterbank (128 bins) → log → per-feature normalisation
  │
  ▼  FastConformer Encoder  (24 layers, d_model=1024, 8 heads)
     └─ ConvSubsampling (3× stride-2 → 8× time reduction)
     └─ RelPositionalEncoding (Transformer-XL style)
     └─ 24 × FastConformerLayer:
           FF₁ (×0.5) → Self-Attn (rel-pos) → Conv (k=9) → FF₂ (×0.5) → LN
  │
  ▼  TDT Decoder
     └─ RNNT Prediction: Embed(8193, 640) + 2-layer LSTM(640)
     └─ Joint: Linear(1024→640) + Linear(640→640) → ReLU → Linear(640→8198)
     └─ TDT greedy decode (durations [0,1,2,3,4], blank_id=8192)
  │
  ▼  SentencePiece decode → text

Weights are loaded directly from the .nemo file (a ZIP archive) without importing any NeMo module.

Optimisations

Encoder Decoder Effect
bfloat16 (auto, Ampere+) native low-precision on modern GPUs; fp16 autocast fallback on older devices
CUDA graph ~20 kernel launches per decode step → 1 graph replay

Jetson Setup

The PyPI wheel works on standard x86 CUDA machines. For Jetson (JetPack 6), PyTorch needs to be installed from NVIDIA's distribution first:

# Install CUDA-enabled PyTorch for JetPack 6
UV_SKIP_WHEEL_FILENAME_CHECK=1 uv pip install \
  https://developer.download.nvidia.com/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl

# Then install nano-parakeet (skipping torch since it's already installed)
pip install nano-parakeet --no-deps
pip install numpy soundfile sentencepiece huggingface-hub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_parakeet-0.2.1.tar.gz (212.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nano_parakeet-0.2.1-py3-none-any.whl (205.1 kB view details)

Uploaded Python 3

File details

Details for the file nano_parakeet-0.2.1.tar.gz.

File metadata

  • Download URL: nano_parakeet-0.2.1.tar.gz
  • Upload date:
  • Size: 212.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for nano_parakeet-0.2.1.tar.gz
Algorithm Hash digest
SHA256 bb8f5e2a976faf48bccdb300598c057b152acef881e133096fa61d9e169e44ca
MD5 5448a45a9867275a47d1e3866a4b106b
BLAKE2b-256 b88b784acbff1e2de6284ca9d7c7c50bc5b1b3060ff5cc1cbd0c717ce8d799f4

See more details on using hashes here.

File details

Details for the file nano_parakeet-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: nano_parakeet-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 205.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for nano_parakeet-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f5a379882769239d18127da0d2cdf13e25665da72c4240a6e9f0b49cb2bdbba5
MD5 dcb5fd8e80b297a39c1fedafae8f749d
BLAKE2b-256 3ad4f834568d8243c4a349c5b706c071d04c0a120db4ebf9d1bfeef9f9ede255

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page