Objective vocal fatigue scoring from speech using ECAPA-TDNN-VHE embeddings
Project description
Auralis VFS (Vocal Fatigue Scoring Library)
Overview
Auralis VFS is a research-grade Python library for objective vocal fatigue assessment using speech audio. It leverages state-of-the-art deep learning models (ECAPA-TDNN-based embeddings and supervised contrastive learning) to compute a Vocal Fatigue Score (0–100) from short audio recordings.
This library is designed for:
- Research studies in voice health, occupational voice monitoring, and speech pathology.
- Integration into speech analysis pipelines.
- Reproducible and standardized scoring across datasets.
Cite our research:
Ahmad, M. K. (2026). Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs (0.1.0). Zenodo. https://doi.org/10.5281/zenodo.18305757
Key Features
- Compute Vocal Fatigue Score from raw audio (
.wav,.mp3,.m4a). - Fast waveform-based scoring using pretrained ECAPA-TDNN embeddings.
- Reference-based scoring using curated embeddings from healthy speakers.
- Production-ready API with
score_audio()andscore_waveform()functions. - Configurable parameters for audio sampling rate, duration, and mel-spectrogram features.
- Designed for research reproducibility.
Installation
pip install auralis-vfs
Dependencies:
- Python >= 3.10
- torch >= 2.1.1
- torchaudio >= 2.1.1
- speechbrain >= 1.0.3
- numpy >= 1.23
- soundfile
- scipy
- pydub
- PyYAML
Optional: GPU acceleration works automatically if PyTorch detects a CUDA-enabled device.
Usage
1. Scoring a waveform
import numpy as np
from auralis.scorer import score_waveform
# Generate fake waveform (1 second of audio at 16kHz)
waveform = np.random.randn(16000).astype("float32")
score = score_waveform(waveform)
print(f"Vocal Fatigue Score: {score:.2f}")
2. Scoring an audio file
from auralis.scorer import score_audio
audio_path = "path/to/speech_sample.wav"
score = score_audio(audio_path)
print(f"Vocal Fatigue Score: {score:.2f}")
Audio Validation
- Supported formats: .wav, .mp3, .m4a
- Duration: 5–10 seconds recommended
Scores range from 0 (no fatigue) to 100 (severe fatigue).
File & Directory Structure
auralis-vfs/
├─ src/auralis/
│ ├─ __init__.py
│ ├─ scorer.py # Public API functions
| ├─ validators.py
│ ├─ ecapa.py # Model wrapper
│ ├─ processing.py # Audio & feature processing
│ ├─ config.py # Paths & constants
│ ├─ data/ # Reference embeddings & axis
│ └─ models/ # Pretrained ECAPA-TDNN-VHE model & config.yaml
├─ tests/
│ ├─ test_scoring.py
├─ pyproject.toml
├─ setup.cfg
├─ CITATIONS.cff
├─ MANIFEST.in
├─ .gitignore
├─ README.md
├─ requirements.txt
└─ LICENSE
API Reference
score_waveform(waveform: np.ndarray) -> float
waveform: 1D numpy array representing audio samples.- Returns: Vocal Fatigue Score (float, 0–100).
score_audio(file_path: str) -> float
file_path: Path to audio file (.wav,.mp3,.m4a).- Validates file extension and duration.
- Returns: Vocal Fatigue Score (float, 0–100).
Future Work
Planned improvements to enhance auralis_vfs:
-
Prosody Feature Integration – Analyze pitch, energy, and speaking rate to enrich scoring.
-
Clinical Report Generation – Provide automatic reports resembling clinical assessments, including:
-
Fatigue trends over time
-
Prosody-based analysis
-
Summary interpretation for voice health monitoring
-
-
Web/API Interface – Seamless integration with Gradio or FastAPI for cloud deployments.
Contributors & Credits
Authors / Maintainers:
- Muhammad Khubaib Ahmad – AI/ML Architect, Vocal Fatigue Modeling
Contributors:
- Faiez Ahmad(Data Manager) – Dataset collection and preprocessing
- Muhammad Anas Tariq(Data Collector) – Dataset organization and verification
License
This project is licensed under the MIT License – see the LICENSE file for details.
Notes for Researchers
- Designed for short audio clips (5–10 seconds).
- Scores are relative to healthy reference embeddings.
- Reproducibility is guaranteed by fixed model weights and configuration files.
- Compatible with both CPU and GPU setups.
Contact
- Email: muhammadkhubaibahmad854@gmail.com
- GitHub: Khubaib8281/auralis-vfs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file auralis_vfs-1.0.0.tar.gz.
File metadata
- Download URL: auralis_vfs-1.0.0.tar.gz
- Upload date:
- Size: 8.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7c72cd87970708988aeb9dbe259584afb69263bb9426951277f7d98d9efa66c
|
|
| MD5 |
2a01b6473fe554eb9e68f7a863170a80
|
|
| BLAKE2b-256 |
2605f85123a0cb0bc7b0c668f92feabc8eb3e0eb5d554ab8504df399aed3c4d0
|
File details
Details for the file auralis_vfs-1.0.0-py3-none-any.whl.
File metadata
- Download URL: auralis_vfs-1.0.0-py3-none-any.whl
- Upload date:
- Size: 8.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40dccb7d3d9a5b4d06c225bee1d8dc16822886336088e44775538096beb77e4a
|
|
| MD5 |
1d721975f6b34c8b95047478a78b0d9f
|
|
| BLAKE2b-256 |
f7c2e9f04a7a21506a84ffcf2a759c045fada2668d3c5d9a48566b74d766077f
|