Skip to main content

ViSQOL - Virtual Speech Quality Objective Listener (Pure Python)

Project description

ViSQOL (Python)

PyPI version CI Python License

A pure Python implementation of Google's ViSQOL (Virtual Speech Quality Objective Listener) v3.3.3 for objective audio/speech quality assessment.

ViSQOL compares a reference audio signal with a degraded version and outputs a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score on a scale of 1.0 – 5.0.

Features

  • Two modes: Audio mode (music/general audio at 48 kHz) and Speech mode (speech at 16 kHz)
  • High accuracy: 11/11 conformance tests pass against the official C++ implementation
    • Audio mode: 9/10 tests produce identical MOS scores (diff = 0.000000), 1 test diff = 0.000117
    • Speech mode: diff = 0.006715
  • Pure Python: no C/C++ compilation required
  • Minimal dependencies: only 4 pip packages (numpy, scipy, soundfile, libsvm-official)
  • Faster than real-time: Audio RTF ≈ 0.71x, Speech RTF ≈ 0.38x

Installation

pip install visqol-python

Or install from source:

git clone https://github.com/talker93/visqol-python.git
cd visqol-python
pip install -e .

Quick Start

Python API

from visqol import VisqolApi

# Audio mode (default) - for music and general audio
api = VisqolApi()
api.create(mode="audio")
result = api.measure("reference.wav", "degraded.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

# Speech mode - for speech signals
api = VisqolApi()
api.create(mode="speech")
result = api.measure("ref_speech.wav", "deg_speech.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

Using NumPy Arrays

import numpy as np
import soundfile as sf
from visqol import VisqolApi

ref, sr = sf.read("reference.wav")
deg, _  = sf.read("degraded.wav")

api = VisqolApi()
api.create(mode="audio")
result = api.measure_from_arrays(ref, deg, sample_rate=sr)
print(f"MOS-LQO: {result.moslqo:.4f}")

Command Line

# Audio mode (default)
python -m visqol -r reference.wav -d degraded.wav

# Speech mode
python -m visqol -r reference.wav -d degraded.wav --speech_mode

# Verbose output (per-patch details)
python -m visqol -r reference.wav -d degraded.wav -v

CLI options:

Flag Description
-r, --reference Path to reference WAV file (required)
-d, --degraded Path to degraded WAV file (required)
--speech_mode Use speech mode (16 kHz, polynomial mapping)
--model Custom SVR model file path (audio mode only)
--search_window Search window radius (default: 60)
--verbose, -v Show detailed per-patch results

Output

The measure() method returns a SimilarityResult object with:

Field Description
moslqo MOS-LQO score (1.0 – 5.0)
vnsim Mean NSIM across all patches
fvnsim Per-frequency-band mean NSIM
fstdnsim Per-frequency-band std of NSIM
fvdegenergy Per-frequency-band degraded energy
patch_sims List of per-patch similarity details

Modes

Audio Mode (default)

  • Target sample rate: 48 kHz
  • 32 Gammatone frequency bands (50 Hz – 15 000 Hz)
  • Quality mapping: SVR (Support Vector Regression) model
  • Best for: music, environmental audio, codecs

Speech Mode

  • Target sample rate: 16 kHz
  • 32 Gammatone frequency bands (50 Hz – 8 000 Hz)
  • Quality mapping: exponential polynomial fit
  • VAD (Voice Activity Detection) based patch selection
  • Best for: speech, VoIP, telephony

Performance

Measured on Apple M-series, Python 3.13:

Mode Avg RTF Typical Time
Audio (48 kHz) 0.71x 7 – 12 s per file pair
Speech (16 kHz) 0.38x ~1 s per file pair

RTF (Real-Time Factor) < 1.0 means faster than real-time.

Project Structure

visqol-python/
├── visqol/                    # Main package
│   ├── __init__.py            # Package exports & version
│   ├── api.py                 # Public API (VisqolApi)
│   ├── visqol_manager.py      # Pipeline orchestrator
│   ├── visqol_core.py         # Core algorithm
│   ├── audio_utils.py         # Audio I/O & SPL normalization
│   ├── signal_utils.py        # Envelope, cross-correlation
│   ├── analysis_window.py     # Hann window
│   ├── gammatone.py           # ERB + Gammatone filterbank + spectrogram
│   ├── patch_creator.py       # Patch creation (Image + VAD modes)
│   ├── patch_selector.py      # DP-based optimal patch matching
│   ├── alignment.py           # Global alignment via cross-correlation
│   ├── nsim.py                # NSIM similarity metric
│   ├── quality_mapper.py      # SVR & exponential quality mapping
│   ├── __main__.py            # CLI entry point
│   └── model/                 # Bundled SVR model
│       └── libsvm_nu_svr_model.txt
├── tests/                     # Tests (pytest)
│   ├── conftest.py            # Shared fixtures & CLI options
│   ├── test_quick.py          # Smoke tests (no external data needed)
│   └── test_conformance.py    # Full conformance tests (needs testdata)
├── .github/workflows/
│   ├── ci.yml                 # CI: test on Python 3.9–3.13
│   └── publish.yml            # Auto-publish to PyPI on tag push
├── pyproject.toml             # Package metadata & build config
├── CHANGELOG.md
├── LICENSE
└── README.md

Conformance Test Results

Tested against the official C++ ViSQOL v3.3.3 expected values:

Test Case Mode Expected MOS Python MOS Δ
strauss_lp35 Audio 1.3889 1.3889 0.000000
steely_lp7 Audio 2.2502 2.2502 0.000000
sopr_256aac Audio 4.6823 4.6823 0.000000
ravel_128opus Audio 4.4651 4.4651 0.000000
moonlight_128aac Audio 4.6843 4.6843 0.000000
harpsichord_96mp3 Audio 4.2237 4.2237 0.000000
guitar_64aac Audio 4.3497 4.3497 0.000000
glock_48aac Audio 4.3325 4.3325 0.000000
contrabassoon_24aac Audio 2.3469 2.3468 0.000117
castanets_identity Audio 4.7321 4.7321 0.000000
speech_CA01 Speech 3.3745 3.3678 0.006715

References

  • Google ViSQOL (C++) — the original implementation this project is ported from
  • Hines, A., Gillen, E., Kelly, D., Skoglund, J., Kokaram, A., & Harte, N. (2015). ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs. The Journal of the Acoustical Society of America.
  • Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O'Gorman, F., & Hines, A. (2020). ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).

License

Apache License 2.0. See LICENSE for details.

This project is a Python port of Google's ViSQOL, which is also licensed under Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visqol_python-3.3.4.tar.gz (87.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

visqol_python-3.3.4-py3-none-any.whl (86.3 kB view details)

Uploaded Python 3

File details

Details for the file visqol_python-3.3.4.tar.gz.

File metadata

  • Download URL: visqol_python-3.3.4.tar.gz
  • Upload date:
  • Size: 87.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visqol_python-3.3.4.tar.gz
Algorithm Hash digest
SHA256 0c4db5299a0b5f2e97e94eacc8c55a5c73ecf9c3af63053d9191c59882b5ca17
MD5 7e45edd369d96d7fcb876903c3be8141
BLAKE2b-256 84129a07c7a2ce999892265f142008134faa03fd0a7ffe127ba19162db24dc0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for visqol_python-3.3.4.tar.gz:

Publisher: publish.yml on talker93/visqol-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file visqol_python-3.3.4-py3-none-any.whl.

File metadata

  • Download URL: visqol_python-3.3.4-py3-none-any.whl
  • Upload date:
  • Size: 86.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for visqol_python-3.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 7d7ccbdf968157b5bf7fb49927d58852549a7410c1871be26a0df849f08038fa
MD5 46d27e091352564286b09965d7b5a334
BLAKE2b-256 8629c6d0d144c59b768633cccd8d7c4ce5d50b23508817b59c046308b4e6053e

See more details on using hashes here.

Provenance

The following attestation bundles were made for visqol_python-3.3.4-py3-none-any.whl:

Publisher: publish.yml on talker93/visqol-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page