Rust-powered Speech-to-Text toolkit with Whisper, Distil-Whisper, and streaming transcription
Reason this release was yanked:
discontinued
Project description
Antenna
Rust-powered multi-model Speech-to-Text toolkit for Python.
Supported Models
| Model | Backend | GPU | Description |
|---|---|---|---|
| Whisper | Candle | CUDA, Metal | OpenAI's encoder-decoder model (tiny → large-v3) |
| Distil-Whisper | Candle | CUDA, Metal | 2x faster distilled Whisper variants |
| Wav2Vec2 | ONNX | CUDA, TensorRT | Meta's CTC-based model |
| Parakeet | sherpa-rs | CUDA, DirectML | NVIDIA's FastConformer-TDT (600M params, blazing fast) |
Installation
CPU Version
pip install antenna-stt
GPU Version (CUDA)
# Option A: Pre-built wheel
pip install antenna-stt[cuda]
# Option B: Build from source (requires CUDA toolkit)
git clone https://github.com/fiction-ai-studios/antenna.git
cd antenna
pip install maturin
maturin build --release --features cuda
pip install target/wheels/antenna_stt-*.whl
Development Installation
git clone https://github.com/fiction-ai-studios/antenna.git
cd antenna
uv venv && source .venv/bin/activate
uv add --dev maturin pytest pytest-asyncio
# Build options:
uv run maturin develop --release # CPU only
uv run maturin develop --release --features cuda # + Whisper GPU
uv run maturin develop --release --features onnx # + Wav2Vec2
uv run maturin develop --release --features sherpa # + Parakeet
uv run maturin develop --release --features sherpa-cuda # + Parakeet GPU
Quick Start
Basic Transcription
import antenna
# One-liner
result = antenna.transcribe("speech.wav", model_size="base")
print(result.text)
# With more control
audio = antenna.load_audio("speech.wav")
audio = antenna.preprocess_for_whisper(audio)
model = antenna.WhisperModel.from_size("base", device="cuda")
result = model.transcribe(audio)
for segment in result.segments:
print(f"[{segment.start:.2f}s] {segment.text}")
Unified Model Registry
import antenna
# Load ANY model through unified API
model = antenna.load_model("whisper/base", device="cuda")
model = antenna.load_model("distil-whisper/distil-small.en", device="cpu")
model = antenna.load_model("wav2vec2/base-960h", device="cpu") # Requires ONNX
model = antenna.load_model("parakeet/tdt-0.6b-v2", device="cuda") # Requires sherpa
# List all available models
for m in antenna.list_models():
print(f"{m.id}: {m.description}")
Streaming Transcription
import antenna
# Real-time chunk-by-chunk processing
transcriber = antenna.StreamingTranscriber.from_model_id(
"whisper/tiny",
device="cpu",
config=antenna.StreamingConfig.realtime()
)
for chunk in audio_chunks:
events = transcriber.process_chunk(chunk)
for event in events:
if event.is_final():
print(event.text())
transcriber.flush()
Async Streaming
import asyncio
import antenna
async def transcribe():
transcriber = antenna.AsyncStreamingTranscriber.from_model_id(
"whisper/tiny", device="cuda"
)
for chunk in audio_chunks:
events = await transcriber.process_chunk_async(chunk)
for event in events:
if event.is_final():
print(event.text())
await transcriber.flush_async()
asyncio.run(transcribe())
Feature Flags
| Feature | Description | Models Enabled |
|---|---|---|
cuda |
Candle CUDA GPU | Whisper, Distil-Whisper |
metal |
Candle Metal GPU (macOS) | Whisper, Distil-Whisper |
onnx |
ONNX Runtime | Wav2Vec2 |
onnx-cuda |
ONNX with CUDA | Wav2Vec2 (GPU) |
sherpa |
sherpa-rs backend | Parakeet |
sherpa-cuda |
sherpa with CUDA | Parakeet (GPU) |
Build with multiple features:
uv run maturin develop --release --features "cuda,onnx,sherpa-cuda"
Parakeet Models (NEW)
NVIDIA Parakeet is incredibly fast (can transcribe 60min audio in ~1 second).
Setup:
# 1. Build with sherpa feature
uv run maturin develop --release --features sherpa-cuda
# 2. Download model
mkdir -p ~/.cache/antenna/parakeet
wget https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2
tar xvf sherpa-onnx-nemo-parakeet-tdt-0.6b-v2-int8.tar.bz2 -C ~/.cache/antenna/parakeet/
Usage:
import antenna
model = antenna.load_model("parakeet/tdt-0.6b-v2", device="cuda")
result = model.transcribe(audio)
print(result.text)
Variants:
parakeet/tdt-0.6b-v2- English (recommended)parakeet/tdt-0.6b-v3- Multilingual (25 languages)
Audio Processing
import antenna
# Load any format (WAV, MP3, FLAC, OGG, M4A)
audio = antenna.load_audio("podcast.mp3")
# Analyze
stats = antenna.analyze_audio(audio)
print(f"RMS: {stats.rms_db:.1f} dB, Peak: {stats.peak_db:.1f} dB")
# Process
audio = antenna.trim_silence(audio, threshold_db=-40)
audio = antenna.normalize_audio(audio, method="rms", target_db=-20)
audio = antenna.preprocess_audio(audio, target_sample_rate=16000, mono=True)
# Save
antenna.save_audio(audio, "processed.wav")
Model Selection Guide
| Model | Size | Speed | Quality | Use Case |
|---|---|---|---|---|
| Whisper tiny | 39M | ★★★★★ | ★★ | Quick tests |
| Whisper base | 74M | ★★★★ | ★★★ | General use |
| Whisper large-v3 | 1.5G | ★ | ★★★★★ | Best quality |
| Distil-Whisper | ~350M | ★★★★ | ★★★★ | Fast + accurate |
| Wav2Vec2 | 95-317M | ★★★ | ★★★ | CTC decoding |
| Parakeet | 600M | ★★★★★ | ★★★★★ | Fastest + accurate |
GPU Availability Check
import antenna
print(f"CUDA: {antenna.is_cuda_available()} ({antenna.cuda_device_count()} devices)")
print(f"Metal: {antenna.is_metal_available()}")
print(f"ONNX: {antenna.is_onnx_available()}")
print(f"ONNX CUDA: {antenna.is_onnx_cuda_available()}")
Testing
cargo test --features sherpa # Rust tests (129)
uv run pytest tests/ -v # Python tests (226+)
Roadmap
- ✅ Whisper/Distil-Whisper (Candle)
- ✅ Wav2Vec2 (ONNX)
- ✅ Parakeet (sherpa-rs)
- ✅ Streaming API with VAD
- ✅ Async streaming
- ✅ GPU support (CUDA, Metal, DirectML)
- 🔲 Canary (NeMo format)
- 🔲 Conformer (sherpa-rs)
- 🔲 Production HTTP/WebSocket server
License
MIT
Acknowledgments
- PyO3 - Rust-Python bindings
- Candle - ML framework
- sherpa-onnx - Parakeet inference
- ONNX Runtime - Cross-platform inference
- OpenAI Whisper - Original Whisper model
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file antenna_stt-0.4.0.tar.gz.
File metadata
- Download URL: antenna_stt-0.4.0.tar.gz
- Upload date:
- Size: 395.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e45cf8df4050ad24590f10f5042f238a2206133633ab1073780d9efc3ff2568
|
|
| MD5 |
4ad5e6dc20b3ad07abd3974e56b77e4b
|
|
| BLAKE2b-256 |
ec132e558fc328284b3fcb2d97520bc855a2df4106917972cdbb47f381cac116
|
File details
Details for the file antenna_stt-0.4.0-cp313-cp313-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: antenna_stt-0.4.0-cp313-cp313-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 6.7 MB
- Tags: CPython 3.13, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e5595ff075e206260d9abea6ce86fa0883a436499586d7ed317dfce0d20fedb
|
|
| MD5 |
bf6cec07199e87bbaf1a9aa3cb137ba4
|
|
| BLAKE2b-256 |
497534537d45091264d81257a8f418e4136741c718b09317edb16002689d70d3
|