Skip to main content

Real-Time Vocal Fatigue Monitoring for Continuous Speech Analytics

Project description

VoiceMonitor

Python PyPI License Downloads Status

Real-Time Vocal Fatigue Monitoring for Continuous Speech Analytics

VoiceMonitor is a Python library for real-time vocal fatigue monitoring built on top of the auralis_vfs vocal fatigue scoring framework. It enables continuous microphone monitoring, fatigue scoring using sliding audio windows, and session-level analytics for voice health monitoring.

The system is designed for researchers, speech engineers, and voice professionals who require automated analysis of vocal strain during prolonged speech activity.

VoiceMonitor processes live microphone input, extracts standardized audio segments, computes fatigue scores, and produces real-time alerts and session analytics.


Overview

Prolonged speaking can lead to vocal fatigue, a condition characterized by strain, reduced vocal efficiency, and potential long-term damage to vocal health.

VoiceMonitor provides a real-time monitoring pipeline that:

  • captures microphone audio streams
  • processes sliding audio windows
  • computes vocal fatigue scores
  • tracks fatigue progression over time
  • generates session analytics and warnings

The system leverages the ECAPA-TDNN-VHE vocal fatigue estimation model and the auralis_vfs scoring framework developed as part of ongoing research in speech health monitoring.


Key Features

  • Real-time microphone audio monitoring
  • Continuous vocal fatigue scoring
  • Sliding window fatigue analysis
  • Fatigue threshold warnings
  • Session-level analytics and reports
  • Chunk-based audio processing pipeline
  • JSON session export for downstream analysis
  • Lightweight CLI interface for quick experiments

Architecture

VoiceMonitor uses a sliding window inference pipeline for continuous analysis.

Microphone Input
        │
        ▼
Audio Stream Buffer
        │
        ▼
Sliding Window Segmentation (5s)
        │
        ▼
auralis_vfs Preprocessing
        │
        ▼
Vocal Fatigue Scoring
        │
        ▼
Session Analytics Engine
        │
        ▼
Fatigue Alerts + Reports

Each processed window produces a fatigue score, enabling real-time tracking of vocal strain progression during speech sessions.


Installation

Requirements

  • Python ≥ 3.10
  • FFmpeg installed on system
  • Microphone access

Install from PyPI

pip install voicemonitor

Install from source

git clone https://github.com/<your-username>/voicemonitor.git
cd voicemonitor
pip install -e .

Dependencies

VoiceMonitor relies on the following core libraries:

  • auralis_vfs
  • numpy
  • sounddevice
  • soundfile
  • pydub
  • tqdm

FFmpeg must be installed for audio processing.


Quick Start

CLI Usage

Start real-time vocal fatigue monitoring:

voicemonitor

Monitor for a fixed duration:

voicemonitor --duration 120

Set a custom fatigue warning threshold:

voicemonitor --threshold 65

Example output:

[20260312_182001] Score: 22.51
[20260312_182006] Score: 31.02
[20260312_182011] Score: 45.44
[20260312_182016] Score: 72.90

⚠ fatigue threshold crossed

After the session completes, a report is generated:

session_report.json

Python API

VoiceMonitor can also be used directly in Python applications.

from voicemonitor import VoiceMonitor

monitor = VoiceMonitor(threshold=70)

session = monitor.start(duration_sec=120)

session.export_json("session_report.json")

Session Analytics

Each monitoring session records:

  • average fatigue score
  • maximum fatigue score
  • timestamps of processed windows
  • processed audio chunk paths
  • fatigue warning events

Example report:

{
  "summary": {
    "average_fatigue": 38.2,
    "max_fatigue": 74.1,
    "readings": 25
  },
  "records": [
    {
      "timestamp": "20260312_182001",
      "chunk": "chunks/20260312_182001.wav",
      "score": 22.51
    }
  ]
}

Configuration

  • Audio overlap is of 1 sec
  • default threshold is 70

Research Background

VoiceMonitor is built upon the auralis_vfs vocal fatigue scoring framework, which was developed as part of research on automated vocal fatigue detection.

The underlying fatigue estimation model is based on an ECAPA-TDNN architecture adapted for vocal health estimation.

Research paper:

Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs

Model repository:

huggingface.co/Khubaib01/ECAPA-TDNN-VHE

The model estimates vocal fatigue levels from short speech segments and provides a continuous fatigue score representing vocal strain.

VoiceMonitor extends this work by enabling real-time fatigue monitoring and session analytics.


Applications

VoiceMonitor can be used in a variety of speech-intensive environments:

  • speech research
  • voice health monitoring
  • call center voice analytics
  • teacher vocal load monitoring
  • podcast and streaming voice tracking
  • speech therapy experiments
  • human-computer interaction studies

Project Structure

voicemonitor/
├── voicemonitor/
│   ├── audio_stream.py
│   ├── analytics.py
│   ├── session.py
│   ├── utils.py
│   ├── config.py
│   └── cli.py
│
├── examples/
│   └── live.py
│
├── tests/
│   └── test_session.py
│
├── LISENCE
├── setup.cfg
├── requirements.txt
├── pyproject.toml
└── README.md

Future Development

Planned enhancements include:

  • real-time visualization dashboard
  • web API for remote monitoring
  • desktop GUI interface
  • voice activity detection integration
  • fatigue trend prediction models
  • speaker-aware monitoring

Citation

If you use VoiceMonitor in research, please cite the underlying work:

Ahmad, M. K. (2026). Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs. Zenodo. https://doi.org/10.5281/zenodo.18366305

License

This project is released under the MIT License.


Author

Muhammad Khubaib Ahmad AI / ML Engineer Speech Intelligence and Audio AI Systems

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicemonitor-0.1.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicemonitor-0.1.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file voicemonitor-0.1.0.tar.gz.

File metadata

  • Download URL: voicemonitor-0.1.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for voicemonitor-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a75170b13de74dbd6a4061f9d9b2be668c2f7879bbfacf8ab7501f871a1102a8
MD5 38f9c46c597e9e3062952f0416969185
BLAKE2b-256 604e40f1e3ae423bec882bd6e9a3e17a43a001e2484f755462d1532367ef0ccf

See more details on using hashes here.

File details

Details for the file voicemonitor-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: voicemonitor-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for voicemonitor-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cd4a39ee8d815ceb2b1243f33d1c2d340bc9e927886b0e1732fb328fab9697fd
MD5 c90798ee5f2a6926bfb7fd3df65fca62
BLAKE2b-256 36caac1bb48a5f09cbc730abe347d92b089379edf3ce409ebac786ec65c1032e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page