Skip to main content

ViSQOL - Virtual Speech Quality Objective Listener (Pure Python)

Project description

ViSQOL (Python)

A pure Python implementation of Google's ViSQOL (Virtual Speech Quality Objective Listener) v3.3.3 for objective audio/speech quality assessment.

ViSQOL compares a reference audio signal with a degraded version and outputs a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score on a scale of 1.0 – 5.0.

Features

  • Two modes: Audio mode (music/general audio at 48 kHz) and Speech mode (speech at 16 kHz)
  • High accuracy: 11/11 conformance tests pass against the official C++ implementation
    • Audio mode: 9/10 tests produce identical MOS scores (diff = 0.000000), 1 test diff = 0.000117
    • Speech mode: diff = 0.006715
  • Pure Python: no C/C++ compilation required
  • Minimal dependencies: only 4 pip packages (numpy, scipy, soundfile, libsvm-official)
  • Faster than real-time: Audio RTF ≈ 0.71x, Speech RTF ≈ 0.38x

Installation

pip install numpy scipy soundfile libsvm-official

Or install as a package:

git clone https://github.com/talker93/visqol-python.git
cd visqol-python
pip install -e .

Quick Start

Python API

from visqol import VisqolApi

# Audio mode (default) - for music and general audio
api = VisqolApi()
api.create(mode="audio")
result = api.measure("reference.wav", "degraded.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

# Speech mode - for speech signals
api = VisqolApi()
api.create(mode="speech")
result = api.measure("ref_speech.wav", "deg_speech.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

Using NumPy Arrays

import numpy as np
import soundfile as sf
from visqol import VisqolApi

ref, sr = sf.read("reference.wav")
deg, _  = sf.read("degraded.wav")

api = VisqolApi()
api.create(mode="audio")
result = api.measure_from_arrays(ref, deg, sample_rate=sr)
print(f"MOS-LQO: {result.moslqo:.4f}")

Command Line

# Audio mode (default)
python -m visqol -r reference.wav -d degraded.wav

# Speech mode
python -m visqol -r reference.wav -d degraded.wav --speech_mode

# Verbose output (per-patch details)
python -m visqol -r reference.wav -d degraded.wav -v

CLI options:

Flag Description
-r, --reference Path to reference WAV file (required)
-d, --degraded Path to degraded WAV file (required)
--speech_mode Use speech mode (16 kHz, polynomial mapping)
--model Custom SVR model file path (audio mode only)
--search_window Search window radius (default: 60)
--verbose, -v Show detailed per-patch results

Output

The measure() method returns a SimilarityResult object with:

Field Description
moslqo MOS-LQO score (1.0 – 5.0)
vnsim Mean NSIM across all patches
fvnsim Per-frequency-band mean NSIM
fstdnsim Per-frequency-band std of NSIM
fvdegenergy Per-frequency-band degraded energy
patch_sims List of per-patch similarity details

Modes

Audio Mode (default)

  • Target sample rate: 48 kHz
  • 32 Gammatone frequency bands (50 Hz – 15 000 Hz)
  • Quality mapping: SVR (Support Vector Regression) model
  • Best for: music, environmental audio, codecs

Speech Mode

  • Target sample rate: 16 kHz
  • 32 Gammatone frequency bands (50 Hz – 8 000 Hz)
  • Quality mapping: exponential polynomial fit
  • VAD (Voice Activity Detection) based patch selection
  • Best for: speech, VoIP, telephony

Performance

Measured on Apple M-series, Python 3.13:

Mode Avg RTF Typical Time
Audio (48 kHz) 0.71x 7 – 12 s per file pair
Speech (16 kHz) 0.38x ~1 s per file pair

RTF (Real-Time Factor) < 1.0 means faster than real-time.

Project Structure

visqol-python/
├── visqol/                    # Main package
│   ├── __init__.py            # Package exports
│   ├── api.py                 # Public API
│   ├── visqol_manager.py      # Pipeline orchestrator
│   ├── visqol_core.py         # Core algorithm
│   ├── audio_utils.py         # Audio I/O & SPL normalization
│   ├── signal_utils.py        # Envelope, cross-correlation
│   ├── analysis_window.py     # Hann window
│   ├── gammatone.py           # ERB + Gammatone filterbank + spectrogram
│   ├── patch_creator.py       # Patch creation (Image + VAD modes)
│   ├── patch_selector.py      # DP-based optimal patch matching
│   ├── alignment.py           # Global alignment via cross-correlation
│   ├── nsim.py                # NSIM similarity metric
│   ├── quality_mapper.py      # SVR & exponential quality mapping
│   └── __main__.py            # CLI entry point
├── model/                     # Bundled SVR model
│   └── libsvm_nu_svr_model.txt
├── tests/                     # Conformance tests
│   ├── test_conformance.py
│   └── test_quick.py
├── setup.py
├── requirements.txt
├── LICENSE
└── README.md

Conformance Test Results

Tested against the official C++ ViSQOL v3.3.3 expected values:

Test Case Mode Expected MOS Python MOS Δ
strauss_lp35 Audio 1.3889 1.3889 0.000000
steely_lp7 Audio 2.2502 2.2502 0.000000
sopr_256aac Audio 4.6823 4.6823 0.000000
ravel_128opus Audio 4.4651 4.4651 0.000000
moonlight_128aac Audio 4.6843 4.6843 0.000000
harpsichord_96mp3 Audio 4.2237 4.2237 0.000000
guitar_64aac Audio 4.3497 4.3497 0.000000
glock_48aac Audio 4.3325 4.3325 0.000000
contrabassoon_24aac Audio 2.3469 2.3468 0.000117
castanets_identity Audio 4.7321 4.7321 0.000000
speech_CA01 Speech 3.3745 3.3678 0.006715

References

  • Google ViSQOL (C++) — the original implementation this project is ported from
  • Hines, A., Gillen, E., Kelly, D., Skoglund, J., Kokaram, A., & Harte, N. (2015). ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs. The Journal of the Acoustical Society of America.
  • Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O'Gorman, F., & Hines, A. (2020). ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).

License

Apache License 2.0. See LICENSE for details.

This project is a Python port of Google's ViSQOL, which is also licensed under Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visqol_python-3.3.3.tar.gz (87.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

visqol_python-3.3.3-py3-none-any.whl (86.0 kB view details)

Uploaded Python 3

File details

Details for the file visqol_python-3.3.3.tar.gz.

File metadata

  • Download URL: visqol_python-3.3.3.tar.gz
  • Upload date:
  • Size: 87.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for visqol_python-3.3.3.tar.gz
Algorithm Hash digest
SHA256 8bae4a6aa461801da0e673cb6cf8d5fb362758be8b269c1b70ac44750fd5a541
MD5 462792d1e8e0b9db8bd2ce6df3ae1183
BLAKE2b-256 123de7efba4fade961c9157f592bc9093310a2725e85377ad80fabb6b63bbfb7

See more details on using hashes here.

File details

Details for the file visqol_python-3.3.3-py3-none-any.whl.

File metadata

  • Download URL: visqol_python-3.3.3-py3-none-any.whl
  • Upload date:
  • Size: 86.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for visqol_python-3.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e38ab8017f83f5d1cb463bc59050fe87d03fffd0a478e639840cbf816f8bd104
MD5 b443fc359c2302106573c1eac2dcc401
BLAKE2b-256 5dc61564be3a84e047d703e7534925e254e24ab942f317cbc769f63069837cb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page