EigenWave-ASR: High-Performance Speech Recognition with Multi-Scale Robin Features
Project description
EigenWave-ASR 🎤
High-Performance Speech Recognition with Multi-Scale Robin Features
A novel ASR model achieving 6.36% WER on LibriSpeech test-clean with only 27.8M parameters.
🚀 Quick Start
from eigenwave import EigenWaveASR
# Load model
model = EigenWaveASR.from_pretrained("./") # local directory
# or
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr") # from Hub
# Transcribe audio
text = model.transcribe("speech.wav")
print(text)
# Output: "hello this is a test of the speech recognition system"
📦 Installation
pip install eigenwave-asr
# For best accuracy (KenLM language model):
pip install eigenwave-asr[lm]
# For HuggingFace Hub support:
pip install eigenwave-asr[all]
📋 Usage Examples
Basic Transcription
from eigenwave import EigenWaveASR
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr")
# From file
text = model.transcribe("audio.wav")
# Batch
texts = model.transcribe(["audio1.wav", "audio2.wav", "audio3.wav"])
# From tensor (16kHz mono)
import torch
audio_tensor = torch.randn(16000 * 5) # 5 seconds
text = model.transcribe(audio_tensor)
Detailed Output
result = model.transcribe_with_details("audio.wav")
print(result)
# {
# "text": "hello world",
# "duration": 2.5,
# "processing_time": 0.320,
# "rtf": 0.128, # Real-Time Factor (< 1.0 = real-time)
# "real_time": True
# }
CPU Inference
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", device="cpu")
text = model.transcribe("audio.wav", beam_width=10) # smaller beam for speed
Without Language Model (faster, less accurate)
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", use_lm=False)
text = model.transcribe("audio.wav")
📊 Performance
| Dataset | Greedy WER | Beam+LM WER |
|---|---|---|
| test-clean | ~8.5% | 6.36% |
| test-other | ~20% | ~15% |
🏗️ Model Details
- Architecture: Conformer-style encoder with Multi-Scale Robin Features
- Parameters: 27.8M
- Input: 16kHz mono audio
- Output: English text (lowercase)
- Training: LibriSpeech 960h, 182k steps
- Features: Novel Robin differential operator feature extraction at scales [1, 3, 5]
⚡ Optimal Hyperparameters (Optuna-tuned)
alpha = 0.9268 (LM weight)
beta = 0.061 (word insertion bonus)
temperature = 0.536 (softmax sharpness)
beam_width = 50 (beam search width)
📄 License
MIT License
👤 Author
Sakib Hasan (@sakibhasanml)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
eigenwave_asr-1.0.4.tar.gz
(10.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eigenwave_asr-1.0.4.tar.gz.
File metadata
- Download URL: eigenwave_asr-1.0.4.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
adba4193e9a50d2a9fce55e1de2217ad919b5e15ab68bec8ab61fdc88efecce8
|
|
| MD5 |
899ef6dddef6fc8ecc54ce22201f10e6
|
|
| BLAKE2b-256 |
f34eb4fea60c08830749b9314e940ba177744f91a633e9ca016f0d0d40c9fa49
|
File details
Details for the file eigenwave_asr-1.0.4-py3-none-any.whl.
File metadata
- Download URL: eigenwave_asr-1.0.4-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86a55b5edf6bec1b8a0068f5f36a84683a7b53adf53d1b0eb1e4b24ee81be991
|
|
| MD5 |
9fea88aa859cce65acf428f0f5e0da39
|
|
| BLAKE2b-256 |
b60a19412d08ca0932cde6678aee6d9a2ae24a1a7c714c64f3f32ac3c652d1d8
|