Skip to main content

Objective vocal fatigue scoring from speech using health-centric ECAPA-TDNN-VHE embeddings with standardized preprocessing and reproducible benchmarking.

Project description

Auralis VFS (Vocal Fatigue Scoring Library)

PyPI Python License


Overview

auralis_vfs is a research-oriented Python library for objective vocal fatigue scoring from speech signals. It provides an end-to-end pipeline for standardized audio preprocessing and embedding-based inference using the proposed ECAPA-TDNN-VHE model, a contrastively trained neural speech encoder designed to quantify deviations in vocal health.

The library is designed to support both experimental research workflows and practical system integration, enabling reproducible vocal fatigue analysis from raw waveform input or audio files with minimal configuration.

auralis_vfs exposes high-level APIs for:

  • Direct scoring from raw waveforms (score_waveform)
  • Scoring from audio files (score_audio)
  • Deterministic and batch-capable audio preprocessing (preprocess_audio)

The preprocessing module enforces a strict and reproducible audio standardization protocol, ensuring compatibility with the ECAPA-TDNN-VHE inference pipeline.

This library is designed for:

  • Research studies in voice health, occupational voice monitoring, and speech pathology.
  • Integration into speech analysis pipelines.
  • Reproducible and standardized scoring across datasets.

Key Features

  • Compute Vocal Fatigue Score from raw audio (.wav, .mp3, .m4a).
  • Fast waveform-based scoring using health centric vocal health encoder ECAPA-TDNN-VHE embeddings.
  • Reference-based scoring using curated embeddings from healthy speakers.
  • Production-ready API with score_audio() and score_waveform() functions.
  • preprocess_audio function for making the audio compatible for passing into score_audio function.
  • Configurable parameters for audio sampling rate, duration, and mel-spectrogram features.
  • Designed for research reproducibility.

Research Orientation

auralis_vfs is built upon the ECAPA-TDNN-VHE model, which is trained using supervised contrastive learning to emphasize vocal health states while suppressing speaker identity information. The library is intended for:

  • Vocal health research
  • Speech pathology experiments
  • Longitudinal voice monitoring
  • Clinical decision-support prototyping
  • Embedded AI health systems

The implementation prioritizes:

  • Reproducibility
  • Deterministic preprocessing
  • Model-driven scoring rather than heuristic acoustic metrics
  • Compatibility with downstream MLOps and API services

Benchmarking Against Baseline ECAPA-TDNN

The model was evaluated on vocal health classification tasks. Results highlight ECAPA-TDNN-VHE's superiority over baseline ECAPA-TDNN:

Model Accuracy Macro F1 Healthy F1 Strained F1 Stressed F1
ECAPA-TDNN (SpeechBrain baseline) 0.36 0.31 0.50 0.22 0.22
ECAPA-TDNN-VHE (Khubaib et al., 2026) 0.78 0.77 0.85 0.78 0.70

This demonstrates state-of-the-art health-centric embedding performance within ECAPA-based architectures.

Cite our research:

Ahmad, M. K. (2026). Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs (0.1.0). Zenodo. https://doi.org/10.5281/zenodo.18305757

Installation

pip install auralis-vfs

Dependencies:

  • Python >= 3.10
  • torch >= 2.1.1
  • torchaudio >= 2.1.1
  • speechbrain >= 1.0.3
  • numpy >= 1.23
  • soundfile
  • scipy
  • pydub
  • PyYAML

Optional: GPU acceleration works automatically if PyTorch detects a CUDA-enabled device.


Usage

1. Scoring a waveform

import numpy as np
from auralis.scorer import score_waveform

# Generate fake waveform (1 second of audio at 16kHz)
waveform = np.random.randn(16000).astype("float32")

score = score_waveform(waveform)
print(f"Vocal Fatigue Score: {score:.2f}")

2. Scoring an audio file

from auralis.scorer import score_audio

audio_path = "path/to/speech_sample.wav"
score = score_audio(audio_path)
print(f"Vocal Fatigue Score: {score:.2f}")

3. Audio Preprocessing

The preprocess_audio function standardizes raw audio into a format compatible with the vocal fatigue scoring pipeline. It converts audio to:

  • WAV format
  • Mono channel
  • 16 kHz sampling rate
  • Fixed duration of 5 seconds

Audio shorter than the required minimum(5 seconds) duration is rejected with a validation error.

Requirements

Ensure ffmpeg is installed and available in your system PATH.

ffmpeg -version

Basic Usage (Single Audio File)

from auralis.processing import preprocess_audio

input_audio = "data/sample.ogg"

processed_files = preprocess_audio(input_audio)

print(processed_files)
# ['data/sample_preprocessed.wav']

Batch Processing (Directory of Audio Files)

from auralis.processing import preprocess_audio

input_dir = "data/raw_audio"
output_dir = "data/processed_audio"

processed_files = preprocess_audio(input_dir, output_dir)

for path in processed_files:
    print(path)

All supported audio files in the directory are processed and saved to the output directory.

Supported Audio Formats

The function processes files with the following extensions:

.wav, .mp3, .ogg, .flac, .m4a, .aac

Audio Validation

  • Supported formats for direct passing to score_audio function: .wav, .mp3, .m4a
  • Duration: 5–10 seconds recommended

Scores range from 0 (no fatigue) to 100 (severe fatigue).


File & Directory Structure

auralis-vfs/
├─ src/auralis/
│  ├─ __init__.py
│  ├─ scorer.py          # Public API functions
|  ├─ validators.py
│  ├─ ecapa.py           # Model wrapper
│  ├─ processing.py      # Audio & feature processing
│  ├─ config.py          # Paths & constants
│  ├─ data/              # Reference embeddings & axis
│  └─ models/            # Pretrained ECAPA-TDNN-VHE model & config.yaml
├─ tests/
│  ├─ test_scoring.py
├─ pyproject.toml
├─ setup.cfg
├─ CITATIONS.cff
├─ MANIFEST.in
├─ .gitignore
├─ README.md
├─ requirements.txt
└─ LICENSE

API Reference

score_waveform(waveform: np.ndarray) -> float

  • waveform: 1D numpy array representing audio samples.
  • Returns: Vocal Fatigue Score (float, 0–100).

score_audio(file_path: str) -> float

  • file_path: Path to audio file (.wav, .mp3, .m4a).
  • Validates file extension and duration.
  • Returns: Vocal Fatigue Score (float, 0–100).

preprocess_audio(input_path/dir: str, output_dir: str)

  • input_path to audio file to be preprocessed.
  • Preprocesses the audio file and saves in the same dir if output_dir is not provided.
  • Prints the path where the preprocessed file/files are stored.

Future Work

Planned improvements to enhance auralis_vfs:

  • Prosody Feature Integration – Analyze pitch, energy, and speaking rate to enrich scoring.

  • Clinical Report Generation – Provide automatic reports resembling clinical assessments, including:

    • Fatigue trends over time
    • Prosody-based analysis
    • Summary interpretation for voice health monitoring
  • Web/API Interface – Seamless integration with Gradio or FastAPI for cloud deployments.

Contributors & Credits

Authors / Maintainers:

  • Muhammad Khubaib Ahmad – AI/ML Architect, Vocal Fatigue Modeling

Contributors:

  • Faiez Ahmad(Data Manager) – Dataset collection and preprocessing
  • Muhammad Anas Tariq(Data Collector) – Dataset organization and verification

License

This project is licensed under the MIT License – see the LICENSE file for details.


Notes for Researchers

  • Designed for short audio clips (5–10 seconds).
  • Scores are relative to healthy reference embeddings.
  • Reproducibility is guaranteed by fixed model weights and configuration files.
  • Compatible with both CPU and GPU setups.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

auralis_vfs-1.1.0.tar.gz (8.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

auralis_vfs-1.1.0-py3-none-any.whl (8.5 MB view details)

Uploaded Python 3

File details

Details for the file auralis_vfs-1.1.0.tar.gz.

File metadata

  • Download URL: auralis_vfs-1.1.0.tar.gz
  • Upload date:
  • Size: 8.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for auralis_vfs-1.1.0.tar.gz
Algorithm Hash digest
SHA256 979e2af92b73013ab2b99447f2257279bcb5db91c2e7a23646bf04e53f65334a
MD5 635155c3a148962a43f1583c27cc0324
BLAKE2b-256 8a478b33b92be515349573605b9fb947d329f9462541507107589dfd51309aa4

See more details on using hashes here.

File details

Details for the file auralis_vfs-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: auralis_vfs-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 8.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for auralis_vfs-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5446b761d6cd9154a87256cf7f28b7e90665190d5846ae36db60be8a8a07885
MD5 c919b546de623b42e3476e442e4b6e6e
BLAKE2b-256 501ea710406ce18466242f07dc02722d1b24d22c993ee6912bb839c617fdea4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page