Skip to main content

Real-Time Vocal Fatigue Monitoring for Continuous Speech Analytics

Project description

VoiceMonitor

Python PyPI License Downloads Status

Real-Time Vocal Fatigue Monitoring for Continuous Speech Analytics

VoiceMonitor is a Python library for real-time vocal fatigue monitoring built on top of the auralis_vfs vocal fatigue scoring framework. It enables continuous microphone monitoring, fatigue scoring using sliding audio windows, and session-level analytics for voice health monitoring.

The system is designed for researchers, speech engineers, and voice professionals who require automated analysis of vocal strain during prolonged speech activity.

VoiceMonitor processes live microphone input, extracts standardized audio segments, computes fatigue scores, and produces real-time alerts and session analytics.


Overview

Prolonged speaking can lead to vocal fatigue, a condition characterized by strain, reduced vocal efficiency, and potential long-term damage to vocal health.

VoiceMonitor provides a real-time monitoring pipeline that:

  • captures microphone audio streams
  • processes sliding audio windows
  • computes vocal fatigue scores
  • tracks fatigue progression over time
  • generates session analytics and warnings

The system leverages the ECAPA-TDNN-VHE vocal fatigue estimation model and the auralis_vfs scoring framework developed as part of ongoing research in speech health monitoring.


Key Features

  • Real-time microphone audio monitoring
  • Continuous vocal fatigue scoring
  • Sliding window fatigue analysis
  • Fatigue threshold warnings
  • Session-level analytics and reports
  • Chunk-based audio processing pipeline
  • JSON session export for downstream analysis
  • Lightweight CLI interface for quick experiments

Architecture

VoiceMonitor uses a sliding window inference pipeline for continuous analysis.

Microphone Input
        │
        ▼
Audio Stream Buffer
        │
        ▼
Sliding Window Segmentation (5s)
        │
        ▼
auralis_vfs Preprocessing
        │
        ▼
Vocal Fatigue Scoring
        │
        ▼
Session Analytics Engine
        │
        ▼
Fatigue Alerts + Reports

Each processed window produces a fatigue score, enabling real-time tracking of vocal strain progression during speech sessions.


Installation

Requirements

  • Python ≥ 3.10
  • FFmpeg installed on system
  • Microphone access

Install from PyPI

pip install voicemonitor

Install from source

git clone https://github.com/khubaib8281/voiceMonitor.git
cd voicemonitor
pip install -e .

Dependencies

VoiceMonitor relies on the following core libraries:

  • auralis_vfs
  • numpy
  • sounddevice
  • soundfile
  • pydub
  • tqdm

FFmpeg must be installed for audio processing.


Quick Start

CLI Usage

Start real-time vocal fatigue monitoring:

voicemonitor

Monitor for a fixed duration:

voicemonitor --duration 120

Set a custom fatigue warning threshold:

voicemonitor --threshold 65

Example output:

[20260312_182001] Score: 22.51
[20260312_182006] Score: 31.02
[20260312_182011] Score: 45.44
[20260312_182016] Score: 72.90

⚠ fatigue threshold crossed

After the session completes, a report is generated:

session_report.json

Python API

VoiceMonitor can also be used directly in Python applications.

from voicemonitor import VoiceMonitor

monitor = VoiceMonitor(threshold=70)

session = monitor.start(duration_sec=120)

session.export_json("session_report.json")

Session Analytics

Each monitoring session records:

  • average fatigue score
  • maximum fatigue score
  • timestamps of processed windows
  • processed audio chunk paths
  • fatigue warning events

Example report:

{
  "summary": {
    "average_fatigue": 38.2,
    "max_fatigue": 74.1,
    "readings": 25
  },
  "records": [
    {
      "timestamp": "20260312_182001",
      "chunk": "chunks/20260312_182001.wav",
      "score": 22.51
    }
  ]
}

Configuration

  • Audio overlap is of 1 sec
  • default threshold is 70

Research Background

VoiceMonitor is built upon the auralis_vfs vocal fatigue scoring framework, which was developed as part of research on automated vocal fatigue detection.

The underlying fatigue estimation model is based on an ECAPA-TDNN architecture adapted for vocal health estimation.

Research paper:

Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs

Model repository:

huggingface.co/Khubaib01/ECAPA-TDNN-VHE

The model estimates vocal fatigue levels from short speech segments and provides a continuous fatigue score representing vocal strain.

VoiceMonitor extends this work by enabling real-time fatigue monitoring and session analytics.


Applications

VoiceMonitor can be used in a variety of speech-intensive environments:

  • speech research
  • voice health monitoring
  • call center voice analytics
  • teacher vocal load monitoring
  • podcast and streaming voice tracking
  • speech therapy experiments
  • human-computer interaction studies

Project Structure

voicemonitor/
├── voicemonitor/
│   ├── audio_stream.py
│   ├── analytics.py
│   ├── session.py
│   ├── utils.py
│   ├── config.py
│   └── cli.py
│
├── examples/
│   └── live.py
│
├── tests/
│   └── test_session.py
│
├── LISENCE
├── setup.cfg
├── requirements.txt
├── pyproject.toml
└── README.md

Future Development

Planned enhancements include:

  • real-time visualization dashboard
  • web API for remote monitoring
  • desktop GUI interface
  • voice activity detection integration
  • fatigue trend prediction models
  • speaker-aware monitoring

Citation

If you use VoiceMonitor in research, please cite the underlying work:

Ahmad, M. K. (2026). Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs. Zenodo. https://doi.org/10.5281/zenodo.18366305

License

This project is released under the MIT License.


Author

Muhammad Khubaib Ahmad AI / ML Engineer Speech Intelligence and Audio AI Systems

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voicemonitor-1.0.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

voicemonitor-1.0.0-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file voicemonitor-1.0.0.tar.gz.

File metadata

  • Download URL: voicemonitor-1.0.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for voicemonitor-1.0.0.tar.gz
Algorithm Hash digest
SHA256 03300e7d4519f308868b60bb4c3bd4d1099bcc9ea45482fb205298743ce8534d
MD5 ab6685904b2b0066ad1c2e0f2e00550c
BLAKE2b-256 b424b4e92593c853624b9f017fcb104e1261458a13063065c27f635852b1fe9d

See more details on using hashes here.

File details

Details for the file voicemonitor-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: voicemonitor-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for voicemonitor-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 913a95777aa08913d0252962d8bc88dad01cd9f4c450f5c4eedefe026752e00e
MD5 28b82b1ac5d4e2a080b83dffec8d141c
BLAKE2b-256 803c2cf9eee3567e3cdc92b14663edb6b3f499c6ad3d49d229a5a6592a10a6b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page