Skip to main content

Objective vocal fatigue scoring from speech using ECAPA-TDNN-VHE embeddings

Project description

Auralis VFS (Vocal Fatigue Scoring Library)

PyPI Python License


Overview

Auralis VFS is a research-grade Python library for objective vocal fatigue assessment using speech audio. It leverages state-of-the-art deep learning models (ECAPA-TDNN-based embeddings and supervised contrastive learning) to compute a Vocal Fatigue Score (0–100) from short audio recordings.

This library is designed for:

  • Research studies in voice health, occupational voice monitoring, and speech pathology.
  • Integration into speech analysis pipelines.
  • Reproducible and standardized scoring across datasets.

Cite our research:

Ahmad, M. K. (2026). Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs (0.1.0). Zenodo. https://doi.org/10.5281/zenodo.18305757


Key Features

  • Compute Vocal Fatigue Score from raw audio (.wav, .mp3, .m4a).
  • Fast waveform-based scoring using pretrained ECAPA-TDNN embeddings.
  • Reference-based scoring using curated embeddings from healthy speakers.
  • Production-ready API with score_audio() and score_waveform() functions.
  • Configurable parameters for audio sampling rate, duration, and mel-spectrogram features.
  • Designed for research reproducibility.

Installation

pip install auralis-vfs

Dependencies:

  • Python >= 3.10
  • torch >= 2.1.1
  • torchaudio >= 2.1.1
  • speechbrain >= 1.0.3
  • numpy >= 1.23
  • soundfile
  • scipy
  • pydub
  • PyYAML

Optional: GPU acceleration works automatically if PyTorch detects a CUDA-enabled device.


Usage

1. Scoring a waveform

import numpy as np
from auralis.scorer import score_waveform

# Generate fake waveform (1 second of audio at 16kHz)
waveform = np.random.randn(16000).astype("float32")

score = score_waveform(waveform)
print(f"Vocal Fatigue Score: {score:.2f}")

2. Scoring an audio file

from auralis.scorer import score_audio

audio_path = "path/to/speech_sample.wav"
score = score_audio(audio_path)
print(f"Vocal Fatigue Score: {score:.2f}")

Audio Validation

  • Supported formats: .wav, .mp3, .m4a
  • Duration: 5–10 seconds recommended

Scores range from 0 (no fatigue) to 100 (severe fatigue).


File & Directory Structure

auralis-vfs/
├─ src/auralis/
│  ├─ __init__.py
│  ├─ scorer.py          # Public API functions
|  ├─ validators.py
│  ├─ ecapa.py           # Model wrapper
│  ├─ processing.py      # Audio & feature processing
│  ├─ config.py          # Paths & constants
│  ├─ data/              # Reference embeddings & axis
│  └─ models/            # Pretrained ECAPA-TDNN-VHE model & config.yaml
├─ tests/
│  ├─ test_scoring.py
├─ pyproject.toml
├─ setup.cfg
├─ CITATIONS.cff
├─ MANIFEST.in
├─ .gitignore
├─ README.md
├─ requirements.txt
└─ LICENSE

API Reference

score_waveform(waveform: np.ndarray) -> float

  • waveform: 1D numpy array representing audio samples.
  • Returns: Vocal Fatigue Score (float, 0–100).

score_audio(file_path: str) -> float

  • file_path: Path to audio file (.wav, .mp3, .m4a).
  • Validates file extension and duration.
  • Returns: Vocal Fatigue Score (float, 0–100).

Future Work

Planned improvements to enhance auralis_vfs:

  • Prosody Feature Integration – Analyze pitch, energy, and speaking rate to enrich scoring.

  • Clinical Report Generation – Provide automatic reports resembling clinical assessments, including:

    • Fatigue trends over time

    • Prosody-based analysis

    • Summary interpretation for voice health monitoring

  • Web/API Interface – Seamless integration with Gradio or FastAPI for cloud deployments.

Contributors & Credits

Authors / Maintainers:

  • Muhammad Khubaib Ahmad – AI/ML Architect, Vocal Fatigue Modeling

Contributors:

  • Faiez Ahmad(Data Manager) – Dataset collection and preprocessing
  • Muhammad Anas Tariq(Data Collector) – Dataset organization and verification

License

This project is licensed under the MIT License – see the LICENSE file for details.


Notes for Researchers

  • Designed for short audio clips (5–10 seconds).
  • Scores are relative to healthy reference embeddings.
  • Reproducibility is guaranteed by fixed model weights and configuration files.
  • Compatible with both CPU and GPU setups.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auralis_vfs-1.0.0.tar.gz (8.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auralis_vfs-1.0.0-py3-none-any.whl (8.5 MB view details)

Uploaded Python 3

File details

Details for the file auralis_vfs-1.0.0.tar.gz.

File metadata

  • Download URL: auralis_vfs-1.0.0.tar.gz
  • Upload date:
  • Size: 8.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for auralis_vfs-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a7c72cd87970708988aeb9dbe259584afb69263bb9426951277f7d98d9efa66c
MD5 2a01b6473fe554eb9e68f7a863170a80
BLAKE2b-256 2605f85123a0cb0bc7b0c668f92feabc8eb3e0eb5d554ab8504df399aed3c4d0

See more details on using hashes here.

File details

Details for the file auralis_vfs-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: auralis_vfs-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for auralis_vfs-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 40dccb7d3d9a5b4d06c225bee1d8dc16822886336088e44775538096beb77e4a
MD5 1d721975f6b34c8b95047478a78b0d9f
BLAKE2b-256 f7c2e9f04a7a21506a84ffcf2a759c045fada2668d3c5d9a48566b74d766077f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page