Real-Time Vocal Fatigue Monitoring for Continuous Speech Analytics
Project description
VoiceMonitor
Real-Time Vocal Fatigue Monitoring for Continuous Speech Analytics
VoiceMonitor is a Python library for real-time vocal fatigue monitoring built on top of the auralis_vfs vocal fatigue scoring framework. It enables continuous microphone monitoring, fatigue scoring using sliding audio windows, and session-level analytics for voice health monitoring.
The system is designed for researchers, speech engineers, and voice professionals who require automated analysis of vocal strain during prolonged speech activity.
VoiceMonitor processes live microphone input, extracts standardized audio segments, computes fatigue scores, and produces real-time alerts and session analytics.
Overview
Prolonged speaking can lead to vocal fatigue, a condition characterized by strain, reduced vocal efficiency, and potential long-term damage to vocal health.
VoiceMonitor provides a real-time monitoring pipeline that:
- captures microphone audio streams
- processes sliding audio windows
- computes vocal fatigue scores
- tracks fatigue progression over time
- generates session analytics and warnings
The system leverages the ECAPA-TDNN-VHE vocal fatigue estimation model and the auralis_vfs scoring framework developed as part of ongoing research in speech health monitoring.
Key Features
- Real-time microphone audio monitoring
- Continuous vocal fatigue scoring
- Sliding window fatigue analysis
- Fatigue threshold warnings
- Session-level analytics and reports
- Chunk-based audio processing pipeline
- JSON session export for downstream analysis
- Lightweight CLI interface for quick experiments
Architecture
VoiceMonitor uses a sliding window inference pipeline for continuous analysis.
Microphone Input
│
▼
Audio Stream Buffer
│
▼
Sliding Window Segmentation (5s)
│
▼
auralis_vfs Preprocessing
│
▼
Vocal Fatigue Scoring
│
▼
Session Analytics Engine
│
▼
Fatigue Alerts + Reports
Each processed window produces a fatigue score, enabling real-time tracking of vocal strain progression during speech sessions.
Installation
Requirements
- Python ≥ 3.10
- FFmpeg installed on system
- Microphone access
Install from PyPI
pip install voicemonitor
Install from source
git clone https://github.com/<your-username>/voicemonitor.git
cd voicemonitor
pip install -e .
Dependencies
VoiceMonitor relies on the following core libraries:
auralis_vfsnumpysounddevicesoundfilepydubtqdm
FFmpeg must be installed for audio processing.
Quick Start
CLI Usage
Start real-time vocal fatigue monitoring:
voicemonitor
Monitor for a fixed duration:
voicemonitor --duration 120
Set a custom fatigue warning threshold:
voicemonitor --threshold 65
Example output:
[20260312_182001] Score: 22.51
[20260312_182006] Score: 31.02
[20260312_182011] Score: 45.44
[20260312_182016] Score: 72.90
⚠ fatigue threshold crossed
After the session completes, a report is generated:
session_report.json
Python API
VoiceMonitor can also be used directly in Python applications.
from voicemonitor import VoiceMonitor
monitor = VoiceMonitor(threshold=70)
session = monitor.start(duration_sec=120)
session.export_json("session_report.json")
Session Analytics
Each monitoring session records:
- average fatigue score
- maximum fatigue score
- timestamps of processed windows
- processed audio chunk paths
- fatigue warning events
Example report:
{
"summary": {
"average_fatigue": 38.2,
"max_fatigue": 74.1,
"readings": 25
},
"records": [
{
"timestamp": "20260312_182001",
"chunk": "chunks/20260312_182001.wav",
"score": 22.51
}
]
}
Configuration
- Audio overlap is of
1 sec - default threshold is 70
Research Background
VoiceMonitor is built upon the auralis_vfs vocal fatigue scoring framework, which was developed as part of research on automated vocal fatigue detection.
The underlying fatigue estimation model is based on an ECAPA-TDNN architecture adapted for vocal health estimation.
Research paper:
Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs
Model repository:
The model estimates vocal fatigue levels from short speech segments and provides a continuous fatigue score representing vocal strain.
VoiceMonitor extends this work by enabling real-time fatigue monitoring and session analytics.
Applications
VoiceMonitor can be used in a variety of speech-intensive environments:
- speech research
- voice health monitoring
- call center voice analytics
- teacher vocal load monitoring
- podcast and streaming voice tracking
- speech therapy experiments
- human-computer interaction studies
Project Structure
voicemonitor/
├── voicemonitor/
│ ├── audio_stream.py
│ ├── analytics.py
│ ├── session.py
│ ├── utils.py
│ ├── config.py
│ └── cli.py
│
├── examples/
│ └── live.py
│
├── tests/
│ └── test_session.py
│
├── LISENCE
├── setup.cfg
├── requirements.txt
├── pyproject.toml
└── README.md
Future Development
Planned enhancements include:
- real-time visualization dashboard
- web API for remote monitoring
- desktop GUI interface
- voice activity detection integration
- fatigue trend prediction models
- speaker-aware monitoring
Citation
If you use VoiceMonitor in research, please cite the underlying work:
Ahmad, M. K. (2026). Modeling Vocal Fatigue as Embedding-Space Deviation Using Contrastively Trained ECAPA-TDNNs. Zenodo. https://doi.org/10.5281/zenodo.18366305
License
This project is released under the MIT License.
Author
Muhammad Khubaib Ahmad AI / ML Engineer Speech Intelligence and Audio AI Systems
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file voicemonitor-0.1.0.tar.gz.
File metadata
- Download URL: voicemonitor-0.1.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a75170b13de74dbd6a4061f9d9b2be668c2f7879bbfacf8ab7501f871a1102a8
|
|
| MD5 |
38f9c46c597e9e3062952f0416969185
|
|
| BLAKE2b-256 |
604e40f1e3ae423bec882bd6e9a3e17a43a001e2484f755462d1532367ef0ccf
|
File details
Details for the file voicemonitor-0.1.0-py3-none-any.whl.
File metadata
- Download URL: voicemonitor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cd4a39ee8d815ceb2b1243f33d1c2d340bc9e927886b0e1732fb328fab9697fd
|
|
| MD5 |
c90798ee5f2a6926bfb7fd3df65fca62
|
|
| BLAKE2b-256 |
36caac1bb48a5f09cbc730abe347d92b089379edf3ce409ebac786ec65c1032e
|