EigenWave-ASR: High-Performance Speech Recognition with Multi-Scale Robin Features
Project description
EigenWave-ASR 🎤
High-Performance Speech Recognition with Multi-Scale Robin Features
A novel ASR model achieving 6.36% WER on LibriSpeech test-clean with only 27.8M parameters.
🚀 Quick Start
from eigenwave import EigenWaveASR
# Load model
model = EigenWaveASR.from_pretrained("./") # local directory
# or
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr") # from Hub
# Transcribe audio
text = model.transcribe("speech.wav")
print(text)
# Output: "hello this is a test of the speech recognition system"
📦 Installation
pip install eigenwave-asr
# For best accuracy (KenLM language model):
pip install eigenwave-asr[lm]
# For HuggingFace Hub support:
pip install eigenwave-asr[all]
📋 Usage Examples
Basic Transcription
from eigenwave import EigenWaveASR
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr")
# From file
text = model.transcribe("audio.wav")
# Batch
texts = model.transcribe(["audio1.wav", "audio2.wav", "audio3.wav"])
# From tensor (16kHz mono)
import torch
audio_tensor = torch.randn(16000 * 5) # 5 seconds
text = model.transcribe(audio_tensor)
Detailed Output
result = model.transcribe_with_details("audio.wav")
print(result)
# {
# "text": "hello world",
# "duration": 2.5,
# "processing_time": 0.320,
# "rtf": 0.128, # Real-Time Factor (< 1.0 = real-time)
# "real_time": True
# }
CPU Inference
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", device="cpu")
text = model.transcribe("audio.wav", beam_width=10) # smaller beam for speed
Without Language Model (faster, less accurate)
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", use_lm=False)
text = model.transcribe("audio.wav")
📊 Performance
| Dataset | Greedy WER | Beam+LM WER |
|---|---|---|
| test-clean | ~8.5% | 6.36% |
| test-other | ~20% | ~15% |
🏗️ Model Details
- Architecture: Conformer-style encoder with Multi-Scale Robin Features
- Parameters: 27.8M
- Input: 16kHz mono audio
- Output: English text (lowercase)
- Training: LibriSpeech 960h, 182k steps
- Features: Novel Robin differential operator feature extraction at scales [1, 3, 5]
⚡ Optimal Hyperparameters (Optuna-tuned)
alpha = 0.9268 (LM weight)
beta = 0.061 (word insertion bonus)
temperature = 0.536 (softmax sharpness)
beam_width = 50 (beam search width)
📄 License
MIT License
👤 Author
Sakib Hasan (@sakibhasanml)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
eigenwave_asr-1.0.1.tar.gz
(8.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eigenwave_asr-1.0.1.tar.gz.
File metadata
- Download URL: eigenwave_asr-1.0.1.tar.gz
- Upload date:
- Size: 8.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7b9a3db36e17096e920272483237dd424dcbb685e3cf703c6de5030a87a88bb
|
|
| MD5 |
96191bb89763c050546a3c2519f006da
|
|
| BLAKE2b-256 |
ee77d88cb47c13b023ac770c1c1cdada6aeec4b828fbd886e7754bc7a755b502
|
File details
Details for the file eigenwave_asr-1.0.1-py3-none-any.whl.
File metadata
- Download URL: eigenwave_asr-1.0.1-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7068a8f08446aefbad9242364382dc60afa3c044bdd86f4b99fccf9e9633eac8
|
|
| MD5 |
99832640cd27f2e47fa332ea5c677d86
|
|
| BLAKE2b-256 |
a0fe2618a750470d10327aac872d9c25ea7e6f8736b23160b289a90c2ac2ee5e
|