EigenWave-ASR: High-Performance Speech Recognition with Multi-Scale Robin Features
Project description
EigenWave-ASR 🎤
High-Performance Speech Recognition with Multi-Scale Robin Features
A novel ASR model achieving 6.36% WER on LibriSpeech test-clean with only 27.8M parameters.
🚀 Quick Start
from eigenwave import EigenWaveASR
# Load model
model = EigenWaveASR.from_pretrained("./") # local directory
# or
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr") # from Hub
# Transcribe audio
text = model.transcribe("speech.wav")
print(text)
# Output: "hello this is a test of the speech recognition system"
📦 Installation
pip install eigenwave-asr
# For best accuracy (KenLM language model):
pip install eigenwave-asr[lm]
# For HuggingFace Hub support:
pip install eigenwave-asr[all]
📋 Usage Examples
Basic Transcription
from eigenwave import EigenWaveASR
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr")
# From file
text = model.transcribe("audio.wav")
# Batch
texts = model.transcribe(["audio1.wav", "audio2.wav", "audio3.wav"])
# From tensor (16kHz mono)
import torch
audio_tensor = torch.randn(16000 * 5) # 5 seconds
text = model.transcribe(audio_tensor)
Detailed Output
result = model.transcribe_with_details("audio.wav")
print(result)
# {
# "text": "hello world",
# "duration": 2.5,
# "processing_time": 0.320,
# "rtf": 0.128, # Real-Time Factor (< 1.0 = real-time)
# "real_time": True
# }
CPU Inference
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", device="cpu")
text = model.transcribe("audio.wav", beam_width=10) # smaller beam for speed
Without Language Model (faster, less accurate)
model = EigenWaveASR.from_pretrained("sakibhasan/eigenwave-asr", use_lm=False)
text = model.transcribe("audio.wav")
📊 Performance
| Dataset | Greedy WER | Beam+LM WER |
|---|---|---|
| test-clean | ~8.5% | 6.36% |
| test-other | ~20% | ~15% |
🏗️ Model Details
- Architecture: Conformer-style encoder with Multi-Scale Robin Features
- Parameters: 27.8M
- Input: 16kHz mono audio
- Output: English text (lowercase)
- Training: LibriSpeech 960h, 182k steps
- Features: Novel Robin differential operator feature extraction at scales [1, 3, 5]
⚡ Optimal Hyperparameters (Optuna-tuned)
alpha = 0.9268 (LM weight)
beta = 0.061 (word insertion bonus)
temperature = 0.536 (softmax sharpness)
beam_width = 50 (beam search width)
📄 License
MIT License
👤 Author
Sakib Hasan (@sakibhasanml)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
eigenwave_asr-1.0.2.tar.gz
(9.8 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eigenwave_asr-1.0.2.tar.gz.
File metadata
- Download URL: eigenwave_asr-1.0.2.tar.gz
- Upload date:
- Size: 9.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a80d955eaf4885944a6ef3736d1c4844d20d78c682aaaebb1b67b81b6e2707fb
|
|
| MD5 |
85b8ab78e82ff9ce91a115e41d6897e8
|
|
| BLAKE2b-256 |
76aaaea8198c1ac944b3ebb448ca658b1f64bab9520ebd5bdd26a52a12c24853
|
File details
Details for the file eigenwave_asr-1.0.2-py3-none-any.whl.
File metadata
- Download URL: eigenwave_asr-1.0.2-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84d813c12823027dee54de2df2bb3622e5539364f313775a1bfa26549f8f2a57
|
|
| MD5 |
45faa5361025dcfd2ea49cf483bf49f5
|
|
| BLAKE2b-256 |
71dcaf61e9dd3929aaa626bcf8de495055214f182c6a6fbdfece283cad479b87
|