ViSQOL - Virtual Speech Quality Objective Listener (Pure Python)

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jacobsjiang

These details have not been verified by PyPI

Project links

Original C++

Project description

ViSQOL (Python)

A pure Python implementation of Google's ViSQOL (Virtual Speech Quality Objective Listener) for objective audio/speech quality assessment.

ViSQOL compares a reference audio signal with a degraded version and outputs a MOS-LQO (Mean Opinion Score - Listening Quality Objective) score on a scale of 1.0 – 5.0.

Features

Two modes: Audio mode (music/general audio at 48 kHz) and Speech mode (speech at 16 kHz)
High accuracy: 12/12 conformance tests pass against the official C++ implementation
- Audio mode: 9/10 tests produce identical MOS scores (diff = 0.000000), 1 test diff = 0.000117
- Speech mode (polynomial): diff = 0.001057
- Speech mode (lattice TFLite): diff = 0.002341
Two speech quality mappers matching C++ ViSQOL:
- Lattice (default) — deep-lattice TFLite network (--use_lattice_model=true in C++); requires the optional [lattice] extra
- Polynomial (fallback) — legacy exponential fit (--use_lattice_model=false in C++)
Pure Python: no C/C++ compilation required (the optional [lattice] extra adds the Google ai-edge-litert TFLite runtime as a binary wheel)
Minimal dependencies: 4 core pip packages (numpy, scipy, soundfile, libsvm-official)
Optional Numba acceleration: pip install visqol-python[accel] for JIT-compiled Gammatone filterbank (parallel) and a fused NSIM + DP patch matching kernel
Optional pyFFTW backend: pip install visqol-python[fftw] routes alignment / xcorr FFTs through FFTW3 — ~16× overall speedup, RTF 0.036 (vs C++ estimate 0.093)
Batch & parallel evaluation: measure_batch(parallel=True) for multi-process execution across CPU cores
Fully typed: PEP 561 py.typed, strict mypy, ruff-enforced code style

Installation

pip install visqol-python

For C++-default-equivalent speech mode (deep-lattice TFLite mapper):

pip install visqol-python[lattice]   # requires Python ≥ 3.10

For Numba-accelerated Gammatone filtering and the fused NSIM + DP kernel:

pip install visqol-python[accel]

For FFTW3-backed alignment FFTs via pyFFTW:

pip install visqol-python[fftw]

Install everything (lattice + numba + fftw):

pip install visqol-python[all]

Or install from source:

git clone https://github.com/talker93/visqol-python.git
cd visqol-python
pip install -e ".[dev]"

Note on speech mode parity: Without the [lattice] extra, speech mode falls back to the polynomial mapping (equivalent to running C++ ViSQOL with --use_lattice_model=false). The polynomial can over-predict MOS by 1–2 points on degraded speech vs the C++ default. Install [lattice] whenever you need numbers that line up with the C++ default behaviour (see issue #1).

Quick Start

Python API

from visqol import VisqolApi

# Audio mode (default) - for music and general audio
api = VisqolApi()
api.create(mode="audio")
result = api.measure("reference.wav", "degraded.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

# Speech mode - for speech signals
api = VisqolApi()
api.create(mode="speech")
result = api.measure("ref_speech.wav", "deg_speech.wav")
print(f"MOS-LQO: {result.moslqo:.4f}")

Using NumPy Arrays

import numpy as np
import soundfile as sf
from visqol import VisqolApi

ref, sr = sf.read("reference.wav")
deg, _  = sf.read("degraded.wav")

api = VisqolApi()
api.create(mode="audio")
result = api.measure_from_arrays(ref, deg, sample_rate=sr)
print(f"MOS-LQO: {result.moslqo:.4f}")

Batch Evaluation

from visqol import VisqolApi

api = VisqolApi()
api.create(mode="audio")

file_pairs = [
    ("ref1.wav", "deg1.wav"),
    ("ref2.wav", "deg2.wav"),
    ("ref3.wav", "deg3.wav"),
]

# Sequential with progress callback
results = api.measure_batch(
    file_pairs,
    progress_callback=lambda done, total: print(f"{done}/{total}"),
)

# Multi-process parallel (uses all CPU cores)
results = api.measure_batch(file_pairs, parallel=True, max_workers=4)

for pair, result in zip(file_pairs, results):
    if isinstance(result, Exception):
        print(f"{pair}: FAILED — {result}")
    else:
        print(f"{pair}: MOS-LQO = {result.moslqo:.4f}")

Command Line

# Audio mode (default)
python -m visqol -r reference.wav -d degraded.wav

# Speech mode
python -m visqol -r reference.wav -d degraded.wav --speech_mode

# Verbose output (per-patch details)
python -m visqol -r reference.wav -d degraded.wav -v

CLI options:

Flag	Description
`-r`, `--reference`	Path to reference WAV file (required)
`-d`, `--degraded`	Path to degraded WAV file (required)
`--speech_mode`	Use speech mode (16 kHz)
`--no_lattice_model`	Speech mode: disable lattice TFLite mapper, use polynomial fallback
`--lattice_model`	Custom path to lattice `.tflite` model (speech mode)
`--unscaled_speech`	Don't scale polynomial speech MOS to 5.0 (polynomial only)
`--model`	Custom SVR model file path (audio mode only)
`--search_window`	Search window radius (default: 60)
`--verbose`, `-v`	Show detailed per-patch results

Output

The measure() method returns a SimilarityResult object with:

Field	Description
`moslqo`	MOS-LQO score (1.0 – 5.0)
`vnsim`	Mean NSIM across all patches
`fvnsim`	Per-frequency-band mean NSIM
`fstdnsim`	Per-frequency-band std of NSIM
`fvdegenergy`	Per-frequency-band degraded energy
`patch_sims`	List of per-patch similarity details

Modes

Audio Mode (default)

Target sample rate: 48 kHz
32 Gammatone frequency bands (50 Hz – 15 000 Hz)
Quality mapping: SVR (Support Vector Regression) model
Best for: music, environmental audio, codecs

Speech Mode

Target sample rate: 16 kHz
21 Gammatone frequency bands (50 Hz – 8 000 Hz)
VAD (Voice Activity Detection) based patch selection
Quality mapping (choose one):
- Deep-lattice TFLite (default) — same mapper as C++ ViSQOL's default --use_lattice_model=true; requires pip install visqol-python[lattice]
- Exponential polynomial (fallback) — same as C++ --use_lattice_model=false; used automatically when the lattice runtime is not installed
Toggle from Python: api.create(mode="speech", use_lattice_model=False)
Toggle from CLI: --no_lattice_model
Best for: speech, VoIP, telephony

Performance

Measured on Apple M-series, Python 3.13, audio mode on the guitar48_stereo 12.5 s conformance case (3-run average):

Configuration	RTF	Typical Time	Speedup vs pure Python
Pure Python + NumPy/SciPy	0.58	~7 s	1.0×
+ `[accel]` (Numba JIT)	0.067	~0.84 s	8.7×
+ `[accel] [fftw]` (Numba + FFTW3)	0.036	~0.45 s	16×

RTF (Real-Time Factor) < 1.0 means faster than real-time. With Numba + pyFFTW the Python implementation runs at 2.6× the C++ estimated speed (C++ RTF ≈ 0.093).

Stage-level breakdown of the v3.6.0 fully-accelerated path:

Stage	Time	%
Gammatone filterbank	0.179 s	40%
DP Patch matching (fused NSIM kernel)	0.131 s	29%
Global alignment (pyFFTW rfft/irfft)	0.091 s	20%
Fine alignment + NSIM	0.043 s	10%
Other (SPL, postproc, SVR, …)	0.003 s	< 1%

Project Structure

visqol-python/
├── visqol/                    # Main package
│   ├── __init__.py            # Package exports & version
│   ├── api.py                 # Public API (VisqolApi)
│   ├── visqol_manager.py      # Pipeline orchestrator
│   ├── visqol_core.py         # Core algorithm
│   ├── audio_utils.py         # Audio I/O & SPL normalization
│   ├── signal_utils.py        # Envelope, cross-correlation
│   ├── analysis_window.py     # Hann window
│   ├── gammatone.py           # ERB + Gammatone filterbank + spectrogram
│   ├── patch_creator.py       # Patch creation (Image + VAD modes)
│   ├── patch_selector.py      # DP-based optimal patch matching
│   ├── alignment.py           # Global alignment via cross-correlation
│   ├── nsim.py                # NSIM similarity metric
│   ├── quality_mapper.py      # SVR & exponential quality mapping
│   ├── numba_accel.py         # Optional Numba JIT kernels (DP, NSIM, Gammatone)
│   ├── __main__.py            # CLI entry point
│   ├── py.typed               # PEP 561 type marker
│   └── model/                 # Bundled SVR model
│       └── libsvm_nu_svr_model.txt
├── tests/                     # Tests & benchmarks (pytest)
│   ├── conftest.py            # Shared fixtures & CLI options
│   ├── test_quick.py          # Smoke tests (no external data needed)
│   ├── test_conformance.py    # Full conformance tests (needs testdata)
│   ├── test_parallel_correctness.py  # Numba parallel correctness tests
│   └── bench_*.py             # Performance benchmarks
├── .github/workflows/
│   ├── ci.yml                 # CI: lint + type-check + matrix test (Python × NumPy)
│   └── publish.yml            # Auto-publish to PyPI on tag push
├── pyproject.toml             # Package metadata & build config
├── CHANGELOG.md
├── CONTRIBUTING.md
├── LICENSE
└── README.md

Conformance Test Results

Tested against the official C++ ViSQOL v3.3.3 expected values:

Test Case	Mode	Expected MOS	Python MOS	Δ
strauss_lp35	Audio	1.3889	1.3889	0.000000
steely_lp7	Audio	2.2502	2.2502	0.000000
sopr_256aac	Audio	4.6823	4.6823	0.000000
ravel_128opus	Audio	4.4651	4.4651	0.000000
moonlight_128aac	Audio	4.6843	4.6843	0.000000
harpsichord_96mp3	Audio	4.2237	4.2237	0.000000
guitar_64aac	Audio	4.3497	4.3497	0.000000
glock_48aac	Audio	4.3325	4.3325	0.000000
contrabassoon_24aac	Audio	2.3469	2.3468	0.000117
castanets_identity	Audio	4.7321	4.7321	0.000000
speech_CA01 (polynomial)	Speech	3.3745	3.3756	0.001057
speech_CA01 (lattice)	Speech	3.3130	3.3153	0.002341

Both speech values come from running the C++ ViSQOL binary directly with the corresponding --use_lattice_model flag, so they represent ground-truth parity targets.

References

Google ViSQOL (C++) — the original implementation this project is ported from
Hines, A., Gillen, E., Kelly, D., Skoglund, J., Kokaram, A., & Harte, N. (2015). ViSQOLAudio: An Objective Audio Quality Metric for Low Bitrate Codecs. The Journal of the Acoustical Society of America.
Chinen, M., Lim, F. S., Skoglund, J., Gureev, N., O'Gorman, F., & Hines, A. (2020). ViSQOL v3: An Open Source Production Ready Objective Speech and Audio Metric. 2020 Twelfth International Conference on Quality of Multimedia Experience (QoMEX).

License

Apache License 2.0. See LICENSE for details.

This project is a Python port of Google's ViSQOL, which is also licensed under Apache 2.0.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jacobsjiang

These details have not been verified by PyPI

Project links

Original C++

Release history Release notifications | RSS feed

This version

3.7.0

May 30, 2026

3.6.0

May 27, 2026

3.5.0

May 26, 2026

3.4.0

Mar 23, 2026

3.3.6

Mar 23, 2026

3.3.5

Mar 23, 2026

3.3.4

Mar 23, 2026

3.3.3

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

visqol_python-3.7.0.tar.gz (899.8 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

visqol_python-3.7.0-py3-none-any.whl (893.3 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file visqol_python-3.7.0.tar.gz.

File metadata

Download URL: visqol_python-3.7.0.tar.gz
Upload date: May 30, 2026
Size: 899.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for visqol_python-3.7.0.tar.gz
Algorithm	Hash digest
SHA256	`c7f039537ab4eb1d9dbb2b5210b718c471c82b06bb8b60026a2050d2e4b112ed`
MD5	`179a4fe33193fdf44909e07ef6214b40`
BLAKE2b-256	`fb9c5045fe3f6458cb900a13dbb67e8e5a4b0eaad2e0259d087719c5b19b8540`

See more details on using hashes here.

Provenance

The following attestation bundles were made for visqol_python-3.7.0.tar.gz:

Publisher: publish.yml on talker93/visqol-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: visqol_python-3.7.0.tar.gz
- Subject digest: c7f039537ab4eb1d9dbb2b5210b718c471c82b06bb8b60026a2050d2e4b112ed
- Sigstore transparency entry: 1675085792
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: talker93/visqol-python@1c3953ec4b6ed2e7fa2a2ce56eab3bad648d98d4
- Branch / Tag: refs/tags/v3.7.0
- Owner: https://github.com/talker93
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1c3953ec4b6ed2e7fa2a2ce56eab3bad648d98d4
- Trigger Event: push

File details

Details for the file visqol_python-3.7.0-py3-none-any.whl.

File metadata

Download URL: visqol_python-3.7.0-py3-none-any.whl
Upload date: May 30, 2026
Size: 893.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for visqol_python-3.7.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`105e63bd2957e53d18a5f515215e0312361e0d4bd83a7c5d624ef4f4243c3d58`
MD5	`7e041ceb82d9cfe360b3be8eac4c5490`
BLAKE2b-256	`936560a75639dbb01d473837fa48514d8f4c2df7ebb5522e5a80a0c416c31713`

See more details on using hashes here.

Provenance

The following attestation bundles were made for visqol_python-3.7.0-py3-none-any.whl:

Publisher: publish.yml on talker93/visqol-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: visqol_python-3.7.0-py3-none-any.whl
- Subject digest: 105e63bd2957e53d18a5f515215e0312361e0d4bd83a7c5d624ef4f4243c3d58
- Sigstore transparency entry: 1675085814
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: talker93/visqol-python@1c3953ec4b6ed2e7fa2a2ce56eab3bad648d98d4
- Branch / Tag: refs/tags/v3.7.0
- Owner: https://github.com/talker93
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1c3953ec4b6ed2e7fa2a2ce56eab3bad648d98d4
- Trigger Event: push

visqol-python 3.7.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ViSQOL (Python)

Features

Installation

Quick Start

Python API

Using NumPy Arrays

Batch Evaluation

Command Line

Output

Modes

Audio Mode (default)

Speech Mode

Performance

Project Structure

Conformance Test Results

References

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance