Skip to main content

Pure-PyTorch Parakeet TDT inference — no NeMo required

Project description

nano-parakeet

Pure-PyTorch inference for NVIDIA Parakeet TDT — no NeMo required.

from nano_parakeet import from_pretrained
model = from_pretrained()
print(model.transcribe("audio.wav"))

Why?

The official NeMo inference stack pulls in ~180 packages — PyTorch Lightning, Hydra, OmegaConf, apex, distributed training scaffolding — none of which are needed at inference time. This makes it painful to integrate Parakeet into existing projects: version conflicts, long installs, and a 30-second cold-start on every process launch.

nano-parakeet reimplements the full inference pipeline in plain PyTorch. The only dependencies are things you probably already have:

nano-parakeet NeMo
Dependencies 5 (torch, numpy, soundfile, sentencepiece, huggingface-hub) ~180
Cold start ~3s (weights only) ~30s (framework init + CUDA kernel compile)
Warm RTF (Jetson AGX Orin) 93× 73×

Transcriptions are byte-identical to NeMo's output.

Install

pip install nano-parakeet

Requires Python 3.10+, PyTorch with CUDA, and ffmpeg.

Usage

Python API

from nano_parakeet import from_pretrained

model = from_pretrained()                    # downloads ~1.1GB on first run
text = model.transcribe("audio.wav")        # path, numpy array, or tensor
print(text)

CLI

nano-parakeet audio.wav
# or
python -m nano_parakeet audio.wav

Accepts OGG, WAV, M4A, or any format ffmpeg can read.

Benchmark

RTF > 1.0 = faster than real-time. 5 timed runs after a warm-up; best time reported.

Warm throughput

GPU Audio NeMo RTF nano-parakeet RTF Speedup
Jetson AGX Orin 64GB 12s ~73× ~92× 1.3×

Cold start (first inference, including framework load)

GPU NeMo nano-parakeet
Jetson AGX Orin 64GB ~30s ~3s

Run both yourself:

git clone https://github.com/andimarafioti/parakeet-stt
cd parakeet-stt
./benchmark.sh sample.wav

How It Works

The full pipeline in plain PyTorch — no NeMo at runtime:

Audio (16 kHz, mono)
  │
  ▼  pre-emphasis (α=0.97) → STFT (n_fft=512, hop=160, win=400)
     → Mel filterbank (128 bins) → log → per-feature normalisation
  │
  ▼  FastConformer Encoder  (24 layers, d_model=1024, 8 heads)
     └─ ConvSubsampling (3× stride-2 → 8× time reduction)
     └─ RelPositionalEncoding (Transformer-XL style)
     └─ 24 × FastConformerLayer:
           FF₁ (×0.5) → Self-Attn (rel-pos) → Conv (k=9) → FF₂ (×0.5) → LN
  │
  ▼  TDT Decoder
     └─ RNNT Prediction: Embed(8193, 640) + 2-layer LSTM(640)
     └─ Joint: Linear(1024→640) + Linear(640→640) → ReLU → Linear(640→8198)
     └─ TDT greedy decode (durations [0,1,2,3,4], blank_id=8192)
  │
  ▼  SentencePiece decode → text

Weights are loaded directly from the .nemo file (a ZIP archive) without importing any NeMo module.

Optimisations

Encoder Decoder Effect
fp16 autocast tensor cores for 1024→4096→1024 FFN matmuls × 24 layers
CUDA graph ~20 kernel launches per decode step → 1 graph replay

Jetson Setup

The PyPI wheel works on standard x86 CUDA machines. For Jetson (JetPack 6), PyTorch needs to be installed from NVIDIA's distribution first:

# Install CUDA-enabled PyTorch for JetPack 6
UV_SKIP_WHEEL_FILENAME_CHECK=1 uv pip install \
  https://developer.download.nvidia.com/compute/redist/jp/v61/pytorch/torch-2.5.0a0+872d972e41.nv24.08.17622132-cp310-cp310-linux_aarch64.whl

# Then install nano-parakeet (skipping torch since it's already installed)
pip install nano-parakeet --no-deps
pip install numpy soundfile sentencepiece huggingface-hub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nano_parakeet-0.1.0.tar.gz (207.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nano_parakeet-0.1.0-py3-none-any.whl (202.7 kB view details)

Uploaded Python 3

File details

Details for the file nano_parakeet-0.1.0.tar.gz.

File metadata

  • Download URL: nano_parakeet-0.1.0.tar.gz
  • Upload date:
  • Size: 207.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for nano_parakeet-0.1.0.tar.gz
Algorithm Hash digest
SHA256 011828000038c8dc94896e28ec620fb89984c01cae3469b5bb607c258031986d
MD5 2927bc88e8c847c85b3616c77b61c1b3
BLAKE2b-256 5ff1f4584b035abc1a714359b81fb848d4f464594e42b174f6130aaa421100fa

See more details on using hashes here.

File details

Details for the file nano_parakeet-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: nano_parakeet-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 202.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for nano_parakeet-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1c6f4c25a63a1c53e8bb6e5cfb9f0f4a62095445ae504782cc9e1c04b1cc53d
MD5 08afd0c26c9aefa2451fa7660428c00a
BLAKE2b-256 93a106b1ef44bc47aa16e20c16d74f51e418837efc7d2dfbd310fdbb234c879f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page