Skip to main content

ViSQOL - Virtual Speech Quality Objective Listener (Pure Python)

Project description

ViSQOL (Python)

PyPI version CI Python License

A pure Python implementation of Google's ViSQOL (Virtual Speech Quality Objective Listener) v3.3.3 for objective audio/speech quality assessment.

ViSQOL compares a reference audio signal with a degraded version and outputs a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score on a scale of 1.0 – 5.0.

Features

  • Two modes: Audio mode (music/general audio at 48 kHz) and Speech mode (speech at 16 kHz)
  • High accuracy: 11/11 conformance tests pass against the official C++ implementation
    • Audio mode: 9/10 tests produce identical MOS scores (diff = 0.000000), 1 test diff = 0.000117
    • Speech mode: diff = 0.006715
  • Pure Python: no C/C++ compilation required
  • Minimal dependencies: only 4 pip packages (numpy, scipy, soundfile, libsvm-official)
  • Faster than real-time: Audio RTF ≈ 0.71x, Speech RTF ≈ 0.38x

Installation

pip install visqol-python

Or install from source:

git clone https://github.com/talker93/visqol-python.git
cd visqol-python
pip install -e .

Quick Start

Python API

from visqol import VisqolApi

# Audio mode (default) - for music and general audio
api = VisqolApi()
api.create(mode="audio")
result = api.measure("reference.wav", "degraded.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

# Speech mode - for speech signals
api = VisqolApi()
api.create(mode="speech")
result = api.measure("ref_speech.wav", "deg_speech.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

Using NumPy Arrays

import numpy as np
import soundfile as sf
from visqol import VisqolApi

ref, sr = sf.read("reference.wav")
deg, _  = sf.read("degraded.wav")

api = VisqolApi()
api.create(mode="audio")
result = api.measure_from_arrays(ref, deg, sample_rate=sr)
print(f"MOS-LQO: {result.moslqo:.4f}")

Command Line

# Audio mode (default)
python -m visqol -r reference.wav -d degraded.wav

# Speech mode
python -m visqol -r reference.wav -d degraded.wav --speech_mode

# Verbose output (per-patch details)
python -m visqol -r reference.wav -d degraded.wav -v

CLI options:

Flag Description
-r, --reference Path to reference WAV file (required)
-d, --degraded Path to degraded WAV file (required)
--speech_mode Use speech mode (16 kHz, polynomial mapping)
--model Custom SVR model file path (audio mode only)
--search_window Search window radius (default: 60)
--verbose, -v Show detailed per-patch results

Output

The measure() method returns a SimilarityResult object with:

Field Description
moslqo MOS-LQO score (1.0 – 5.0)
vnsim Mean NSIM across all patches
fvnsim Per-frequency-band mean NSIM
fstdnsim Per-frequency-band std of NSIM
fvdegenergy Per-frequency-band degraded energy
patch_sims List of per-patch similarity details

Modes

Audio Mode (default)

  • Target sample rate: 48 kHz
  • 32 Gammatone frequency bands (50 Hz – 15 000 Hz)
  • Quality mapping: SVR (Support Vector Regression) model
  • Best for: music, environmental audio, codecs

Speech Mode

  • Target sample rate: 16 kHz
  • 32 Gammatone frequency bands (50 Hz – 8 000 Hz)
  • Quality mapping: exponential polynomial fit
  • VAD (Voice Activity Detection) based patch selection
  • Best for: speech, VoIP, telephony

Performance

Measured on Apple M-series, Python 3.13:

Mode Avg RTF Typical Time
Audio (48 kHz) 0.71x 7 – 12 s per file pair
Speech (16 kHz) 0.38x ~1 s per file pair

RTF (Real-Time Factor) < 1.0 means faster than real-time.

Project Structure

visqol-python/
├── visqol/                    # Main package
│   ├── __init__.py            # Package exports & version
│   ├── api.py                 # Public API (VisqolApi)
│   ├── visqol_manager.py      # Pipeline orchestrator
│   ├── visqol_core.py         # Core algorithm
│   ├── audio_utils.py         # Audio I/O & SPL normalization
│   ├── signal_utils.py        # Envelope, cross-correlation
│   ├── analysis_window.py     # Hann window
│   ├── gammatone.py           # ERB + Gammatone filterbank + spectrogram
│   ├── patch_creator.py       # Patch creation (Image + VAD modes)
│   ├── patch_selector.py      # DP-based optimal patch matching
│   ├── alignment.py           # Global alignment via cross-correlation
│   ├── nsim.py                # NSIM similarity metric
│   ├── quality_mapper.py      # SVR & exponential quality mapping
│   ├── __main__.py            # CLI entry point
│   └── model/                 # Bundled SVR model
│       └── libsvm_nu_svr_model.txt
├── tests/                     # Tests (pytest)
│   ├── conftest.py            # Shared fixtures & CLI options
│   ├── test_quick.py          # Smoke tests (no external data needed)
│   └── test_conformance.py    # Full conformance tests (needs testdata)
├── .github/workflows/
│   ├── ci.yml                 # CI: test on Python 3.9–3.13
│   └── publish.yml            # Auto-publish to PyPI on tag push
├── pyproject.toml             # Package metadata & build config
├── CHANGELOG.md
├── LICENSE
└── README.md

Conformance Test Results

Tested against the official C++ ViSQOL v3.3.3 expected values:

Test Case Mode Expected MOS Python MOS Δ
strauss_lp35 Audio 1.3889 1.3889 0.000000
steely_lp7 Audio 2.2502 2.2502 0.000000
sopr_256aac Audio 4.6823 4.6823 0.000000
ravel_128opus Audio 4.4651 4.4651 0.000000
moonlight_128aac Audio 4.6843 4.6843 0.000000
harpsichord_96mp3 Audio 4.2237 4.2237 0.000000
guitar_64aac Audio 4.3497 4.3497 0.000000
glock_48aac Audio 4.3325 4.3325 0.000000
contrabassoon_24aac Audio 2.3469 2.3468 0.000117
castanets_identity Audio 4.7321 4.7321 0.000000
speech_CA01 Speech 3.3745 3.3678 0.006715

References

  • Google ViSQOL (C++) — the original implementation this project is ported from
  • Hines, A., Gillen, E., Kelly, D., Skoglund, J., Kokaram, A., & Harte, N. (2015). ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs. The Journal of the Acoustical Society of America.
  • Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O'Gorman, F., & Hines, A. (2020). ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).

License

Apache License 2.0. See LICENSE for details.

This project is a Python port of Google's ViSQOL, which is also licensed under Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visqol_python-3.3.5.tar.gz (89.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

visqol_python-3.3.5-py3-none-any.whl (88.6 kB view details)

Uploaded Python 3

File details

Details for the file visqol_python-3.3.5.tar.gz.

File metadata

  • Download URL: visqol_python-3.3.5.tar.gz
  • Upload date:
  • Size: 89.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visqol_python-3.3.5.tar.gz
Algorithm Hash digest
SHA256 755739c6133ce92d50feaa6348424615047ba459a2a2f891d2c2104fa0e9f258
MD5 54c02a618db17ab14ed7114ee4f52da5
BLAKE2b-256 65332a4aa43d70f21a1c1783b5ec3212b1590694373e140c4ca17f193e1b47b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for visqol_python-3.3.5.tar.gz:

Publisher: publish.yml on talker93/visqol-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file visqol_python-3.3.5-py3-none-any.whl.

File metadata

  • Download URL: visqol_python-3.3.5-py3-none-any.whl
  • Upload date:
  • Size: 88.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visqol_python-3.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 cccccc426d460ab4f6f037a9076ed69fdebaee5cadcb1305362a73adbbbe382d
MD5 2e4188f0c805efc2abb3b612c76de822
BLAKE2b-256 565efe6b7a3221750a869fd714fe07803ef545fe1cea4d19889a97eec8075601

See more details on using hashes here.

Provenance

The following attestation bundles were made for visqol_python-3.3.5-py3-none-any.whl:

Publisher: publish.yml on talker93/visqol-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page