EigenWave-ASR: High-Performance Speech Recognition with Multi-Scale Robin Features
Project description
EigenWave-ASR 🎤
High-Performance Speech Recognition with Multi-Scale Robin Features
A novel ASR model achieving 6.36% WER on LibriSpeech test-clean with only 27.8M parameters.
🚀 Quick Start
from eigenwave import EigenWaveASR
# Load model
model = EigenWaveASR.from_pretrained("./") # local directory
# or
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr") # from Hub
# Transcribe audio
text = model.transcribe("speech.wav")
print(text)
# Output: "hello this is a test of the speech recognition system"
📦 Installation
pip install eigenwave-asr
# For best accuracy (KenLM language model):
pip install eigenwave-asr[lm]
# For HuggingFace Hub support:
pip install eigenwave-asr[all]
📋 Usage Examples
Basic Transcription
from eigenwave import EigenWaveASR
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr")
# From file
text = model.transcribe("audio.wav")
# Batch
texts = model.transcribe(["audio1.wav", "audio2.wav", "audio3.wav"])
# From tensor (16kHz mono)
import torch
audio_tensor = torch.randn(16000 * 5) # 5 seconds
text = model.transcribe(audio_tensor)
Detailed Output
result = model.transcribe_with_details("audio.wav")
print(result)
# {
# "text": "hello world",
# "duration": 2.5,
# "processing_time": 0.320,
# "rtf": 0.128, # Real-Time Factor (< 1.0 = real-time)
# "real_time": True
# }
CPU Inference
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", device="cpu")
text = model.transcribe("audio.wav", beam_width=10) # smaller beam for speed
Without Language Model (faster, less accurate)
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", use_lm=False)
text = model.transcribe("audio.wav")
📊 Performance
| Dataset | Greedy WER | Beam+LM WER |
|---|---|---|
| test-clean | ~8.5% | 6.36% |
| test-other | ~20% | ~15% |
🏗️ Model Details
- Architecture: Conformer-style encoder with Multi-Scale Robin Features
- Parameters: 27.8M
- Input: 16kHz mono audio
- Output: English text (lowercase)
- Training: LibriSpeech 960h, 182k steps
- Features: Novel Robin differential operator feature extraction at scales [1, 3, 5]
⚡ Optimal Hyperparameters (Optuna-tuned)
alpha = 0.9268 (LM weight)
beta = 0.061 (word insertion bonus)
temperature = 0.536 (softmax sharpness)
beam_width = 50 (beam search width)
📄 License
MIT License
👤 Author
Sakib Hasan (@sakibhasanml)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
eigenwave_asr-1.0.3.tar.gz
(10.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eigenwave_asr-1.0.3.tar.gz.
File metadata
- Download URL: eigenwave_asr-1.0.3.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44def180d0b414b584df8dec39214f5dc28a5370c4c614c50af0ee2434e8ef52
|
|
| MD5 |
83dc322d3c3bd79581535a4981b8a57b
|
|
| BLAKE2b-256 |
77bdec2b4ceae91701691469e91084bf775e35cf65a5206479dbcfe7359b1c10
|
File details
Details for the file eigenwave_asr-1.0.3-py3-none-any.whl.
File metadata
- Download URL: eigenwave_asr-1.0.3-py3-none-any.whl
- Upload date:
- Size: 9.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5215cc4474b34ff3dd5b7e92b42a449c43f880193981d1c1db8ad1d881088837
|
|
| MD5 |
e564498d4e72c92a27f7d88f65fcf060
|
|
| BLAKE2b-256 |
0b48d2df3f42fe7ae079d4f755ad76879e55099de28929ad2ffd777c0c5b4742
|