ViSQOL - Virtual Speech Quality Objective Listener (Pure Python)
Project description
ViSQOL (Python)
A pure Python implementation of Google's ViSQOL (Virtual Speech Quality Objective Listener) v3.3.3 for objective audio/speech quality assessment.
ViSQOL compares a reference audio signal with a degraded version and outputs a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score on a scale of 1.0 – 5.0.
Features
- Two modes: Audio mode (music/general audio at 48 kHz) and Speech mode (speech at 16 kHz)
- High accuracy: 11/11 conformance tests pass against the official C++ implementation
- Audio mode: 9/10 tests produce identical MOS scores (diff = 0.000000), 1 test diff = 0.000117
- Speech mode: diff = 0.006715
- Pure Python: no C/C++ compilation required
- Minimal dependencies: only 4 pip packages (
numpy,scipy,soundfile,libsvm-official) - Faster than real-time: Audio RTF ≈ 0.71x, Speech RTF ≈ 0.38x
Installation
pip install visqol-python
Or install from source:
git clone https://github.com/talker93/visqol-python.git
cd visqol-python
pip install -e .
Quick Start
Python API
from visqol import VisqolApi
# Audio mode (default) - for music and general audio
api = VisqolApi()
api.create(mode="audio")
result = api.measure("reference.wav", "degraded.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")
# Speech mode - for speech signals
api = VisqolApi()
api.create(mode="speech")
result = api.measure("ref_speech.wav", "deg_speech.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")
Using NumPy Arrays
import numpy as np
import soundfile as sf
from visqol import VisqolApi
ref, sr = sf.read("reference.wav")
deg, _ = sf.read("degraded.wav")
api = VisqolApi()
api.create(mode="audio")
result = api.measure_from_arrays(ref, deg, sample_rate=sr)
print(f"MOS-LQO: {result.moslqo:.4f}")
Batch Evaluation
from visqol import VisqolApi
api = VisqolApi()
api.create(mode="audio")
file_pairs = [
("ref1.wav", "deg1.wav"),
("ref2.wav", "deg2.wav"),
("ref3.wav", "deg3.wav"),
]
# Optional progress callback
results = api.measure_batch(
file_pairs,
progress_callback=lambda done, total: print(f"{done}/{total}"),
)
for pair, result in zip(file_pairs, results):
if isinstance(result, Exception):
print(f"{pair}: FAILED — {result}")
else:
print(f"{pair}: MOS-LQO = {result.moslqo:.4f}")
Command Line
# Audio mode (default)
python -m visqol -r reference.wav -d degraded.wav
# Speech mode
python -m visqol -r reference.wav -d degraded.wav --speech_mode
# Verbose output (per-patch details)
python -m visqol -r reference.wav -d degraded.wav -v
CLI options:
| Flag | Description |
|---|---|
-r, --reference |
Path to reference WAV file (required) |
-d, --degraded |
Path to degraded WAV file (required) |
--speech_mode |
Use speech mode (16 kHz, polynomial mapping) |
--model |
Custom SVR model file path (audio mode only) |
--search_window |
Search window radius (default: 60) |
--verbose, -v |
Show detailed per-patch results |
Output
The measure() method returns a SimilarityResult object with:
| Field | Description |
|---|---|
moslqo |
MOS-LQO score (1.0 – 5.0) |
vnsim |
Mean NSIM across all patches |
fvnsim |
Per-frequency-band mean NSIM |
fstdnsim |
Per-frequency-band std of NSIM |
fvdegenergy |
Per-frequency-band degraded energy |
patch_sims |
List of per-patch similarity details |
Modes
Audio Mode (default)
- Target sample rate: 48 kHz
- 32 Gammatone frequency bands (50 Hz – 15 000 Hz)
- Quality mapping: SVR (Support Vector Regression) model
- Best for: music, environmental audio, codecs
Speech Mode
- Target sample rate: 16 kHz
- 32 Gammatone frequency bands (50 Hz – 8 000 Hz)
- Quality mapping: exponential polynomial fit
- VAD (Voice Activity Detection) based patch selection
- Best for: speech, VoIP, telephony
Performance
Measured on Apple M-series, Python 3.13:
| Mode | Avg RTF | Typical Time |
|---|---|---|
| Audio (48 kHz) | 0.71x | 7 – 12 s per file pair |
| Speech (16 kHz) | 0.38x | ~1 s per file pair |
RTF (Real-Time Factor) < 1.0 means faster than real-time.
Project Structure
visqol-python/
├── visqol/ # Main package
│ ├── __init__.py # Package exports & version
│ ├── api.py # Public API (VisqolApi)
│ ├── visqol_manager.py # Pipeline orchestrator
│ ├── visqol_core.py # Core algorithm
│ ├── audio_utils.py # Audio I/O & SPL normalization
│ ├── signal_utils.py # Envelope, cross-correlation
│ ├── analysis_window.py # Hann window
│ ├── gammatone.py # ERB + Gammatone filterbank + spectrogram
│ ├── patch_creator.py # Patch creation (Image + VAD modes)
│ ├── patch_selector.py # DP-based optimal patch matching
│ ├── alignment.py # Global alignment via cross-correlation
│ ├── nsim.py # NSIM similarity metric
│ ├── quality_mapper.py # SVR & exponential quality mapping
│ ├── __main__.py # CLI entry point
│ └── model/ # Bundled SVR model
│ └── libsvm_nu_svr_model.txt
├── tests/ # Tests (pytest)
│ ├── conftest.py # Shared fixtures & CLI options
│ ├── test_quick.py # Smoke tests (no external data needed)
│ └── test_conformance.py # Full conformance tests (needs testdata)
├── .github/workflows/
│ ├── ci.yml # CI: test on Python 3.9–3.13
│ └── publish.yml # Auto-publish to PyPI on tag push
├── pyproject.toml # Package metadata & build config
├── CHANGELOG.md
├── LICENSE
└── README.md
Conformance Test Results
Tested against the official C++ ViSQOL v3.3.3 expected values:
| Test Case | Mode | Expected MOS | Python MOS | Δ |
|---|---|---|---|---|
| strauss_lp35 | Audio | 1.3889 | 1.3889 | 0.000000 |
| steely_lp7 | Audio | 2.2502 | 2.2502 | 0.000000 |
| sopr_256aac | Audio | 4.6823 | 4.6823 | 0.000000 |
| ravel_128opus | Audio | 4.4651 | 4.4651 | 0.000000 |
| moonlight_128aac | Audio | 4.6843 | 4.6843 | 0.000000 |
| harpsichord_96mp3 | Audio | 4.2237 | 4.2237 | 0.000000 |
| guitar_64aac | Audio | 4.3497 | 4.3497 | 0.000000 |
| glock_48aac | Audio | 4.3325 | 4.3325 | 0.000000 |
| contrabassoon_24aac | Audio | 2.3469 | 2.3468 | 0.000117 |
| castanets_identity | Audio | 4.7321 | 4.7321 | 0.000000 |
| speech_CA01 | Speech | 3.3745 | 3.3678 | 0.006715 |
References
- Google ViSQOL (C++) — the original implementation this project is ported from
- Hines, A., Gillen, E., Kelly, D., Skoglund, J., Kokaram, A., & Harte, N. (2015). ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs. The Journal of the Acoustical Society of America.
- Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O'Gorman, F., & Hines, A. (2020). ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).
License
Apache License 2.0. See LICENSE for details.
This project is a Python port of Google's ViSQOL, which is also licensed under Apache 2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file visqol_python-3.4.0.tar.gz.
File metadata
- Download URL: visqol_python-3.4.0.tar.gz
- Upload date:
- Size: 102.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
118342d57bb5e6865ad78b98efe970681ab7be5b8f44d32feb56bd48077b91ef
|
|
| MD5 |
98e75c772161a5fd7ac29a065571b8c1
|
|
| BLAKE2b-256 |
af648d933a9842283fd9877853def42d112a822f6fed5af2d1e118ec2ad34df9
|
Provenance
The following attestation bundles were made for visqol_python-3.4.0.tar.gz:
Publisher:
publish.yml on talker93/visqol-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visqol_python-3.4.0.tar.gz -
Subject digest:
118342d57bb5e6865ad78b98efe970681ab7be5b8f44d32feb56bd48077b91ef - Sigstore transparency entry: 1159755777
- Sigstore integration time:
-
Permalink:
talker93/visqol-python@1909de269b0894cba9e40f0b3734e3de3cb862a7 -
Branch / Tag:
refs/tags/v3.4.0 - Owner: https://github.com/talker93
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1909de269b0894cba9e40f0b3734e3de3cb862a7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file visqol_python-3.4.0-py3-none-any.whl.
File metadata
- Download URL: visqol_python-3.4.0-py3-none-any.whl
- Upload date:
- Size: 98.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
07d0b13c8963134a61fd6c31027e13c6cfe9e2d9aefba5cb81768df4995593e7
|
|
| MD5 |
ab46776452e76e0fdaa66f5e33253239
|
|
| BLAKE2b-256 |
eaf4fa48957c8b07a421c43a6e720542b4af1a0ab5275477095e09905a5a7bbc
|
Provenance
The following attestation bundles were made for visqol_python-3.4.0-py3-none-any.whl:
Publisher:
publish.yml on talker93/visqol-python
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
visqol_python-3.4.0-py3-none-any.whl -
Subject digest:
07d0b13c8963134a61fd6c31027e13c6cfe9e2d9aefba5cb81768df4995593e7 - Sigstore transparency entry: 1159755854
- Sigstore integration time:
-
Permalink:
talker93/visqol-python@1909de269b0894cba9e40f0b3734e3de3cb862a7 -
Branch / Tag:
refs/tags/v3.4.0 - Owner: https://github.com/talker93
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@1909de269b0894cba9e40f0b3734e3de3cb862a7 -
Trigger Event:
push
-
Statement type: