pyvoicebox-sap

Python port of the VOICEBOX Speech and Audio Processing toolbox

These details have not been verified by PyPI

Project links

Project description

A complete Python port of the VOICEBOX Speech and Audio Processing toolbox by Mike Brookes, Imperial College London.

280+ functions · fully typed · validated against the original MATLAB source via GNU Octave with 500+ automated tests

What is VOICEBOX?

VOICEBOX is a comprehensive MATLAB toolkit for speech and audio signal processing maintained since the 1990s. It covers areas that most Python audio libraries don't touch:

Linear Predictive Coding — 60+ functions for LPC analysis and conversion between representations (AR coefficients, cepstra, reflection coefficients, line spectra, etc.)
Gaussian Mixture Models — full GMM suite: fitting (EM), scoring, merging, divergence, batch processing
Speech Enhancement — spectral subtraction, MMSE estimators, noise estimation (Rangachari & Loizou, Martin)
Pitch Detection — PEFAC (v_fxpefac), RAPT (v_fxrapt), DYPSA glottal closure detection
Psychoacoustics — PESQ/MOS mapping, Speech Intelligibility Index (SII), STOI, loudness (phon/sone)
Rotations & Quaternions — Euler angles, rotation matrices, quaternions, polygon/polyhedron geometry
Audio Codecs — mu-law, A-law, WAV, HTK, SPHERE/TIMIT, AIFF, AU, FLAC readers
Frequency Scales — Mel, Bark, ERB, Cent, MIDI conversions
Signal Processing — enframing, overlap-add, STFT, filterbanks, zero-crossing detection, Teager energy

How does it compare to librosa and openSMILE?

pyvoicebox, librosa, and openSMILE cover overlapping but fundamentally different parts of audio processing:

pyvoicebox — speech engineering: LPC, enhancement, quality metrics, classical speech analysis.
librosa — music information retrieval: beat tracking, chroma, CQT, harmonic/percussive separation.
openSMILE — reproducible paralinguistic features for affective computing, with a C++ real-time core.

	pyvoicebox	librosa	openSMILE
License	LGPL-3.0	ISC	Dual — free for research, commercial licence required from audEERING
LPC analysis (60+ representations)	Full suite	`lpc()` only	Internal, not exposed
Speech enhancement (MMSE, spectral subtraction, dereverb)	Full	None	None
Psychoacoustic quality metrics (PESQ, SII, STOI, phon/sone)	Full	None	None
Gaussian mixtures (fit, score, merge, divergence)	Full	None	None
Pitch detection	PEFAC, RAPT, DYPSA	pYIN	SHS, SWIPE', ACF
Standardised feature sets (ComParE, eGeMAPS)	None	None	Full
MIR features (chroma, CQT, beat tracking)	None	Full	Partial
Real-time / embedded deployment	No	No	Yes (C++)
MFCC / mel spectrogram	Yes	Yes	Yes

Use pyvoicebox when you need speech-specific processing (LPC, enhancement, quality metrics) or are porting MATLAB code that depends on VOICEBOX. Use librosa for music information retrieval and quick audio-ML prototyping. Use openSMILE when you need reproducible paralinguistic feature sets or real-time deployment — but check the commercial licence if you're not using it for academic research.

They complement each other. A common pipeline is pyvoicebox for cleanup and quality scoring, openSMILE or librosa for feature extraction, then scikit-learn / PyTorch for modelling.

Installation

pip install pyvoicebox-sap                # core (numpy, scipy, soundfile)
pip install "pyvoicebox-sap[plot]"        # with matplotlib for plotting functions

For development:

pip install -e ".[dev]"

Quick Start

# Frequency conversions
from pyvoicebox import v_frq2mel, v_mel2frq
mel = v_frq2mel(440)          # Hz to Mel
hz = v_mel2frq(mel)           # back to Hz

# MFCC extraction
from pyvoicebox import v_melcepst
import soundfile as sf
signal, fs = sf.read('speech.wav')
mfcc = v_melcepst(signal, fs, 'M0dD', 12)  # 12 MFCCs + deltas

# LPC analysis
from pyvoicebox import v_lpcauto, v_lpcar2cc
ar, e, k = v_lpcauto(signal, 12)     # 12th-order LPC
cc = v_lpcar2cc(ar)                  # AR -> cepstral coefficients

# Quaternion operations
from pyvoicebox import v_roteu2qr, v_rotqr2ro
import numpy as np
q = v_roteu2qr('xyz', np.array([0.1, 0.2, 0.3]))  # Euler -> quaternion
R = v_rotqr2ro(q)                                 # quaternion -> rotation matrix

# Noise estimation & speech enhancement
from pyvoicebox import v_estnoiseg, v_specsub

All functions are available with both the v_ prefix (matching MATLAB) and without:

from pyvoicebox import frq2mel       # same as v_frq2mel
from pyvoicebox import melcepst      # same as v_melcepst
from pyvoicebox import lpcauto       # same as v_lpcauto

Notebooks

Interactive Jupyter notebooks with audio playback:

Notebook	Description	Colab
Visualize Speech	Waveform, spectrogram, MFCCs, and pitch tracking
Clean Up Noisy Speech	Add noise, run MMSE enhancement, measure SNR improvement
Inside the Vocal Tract	LPC spectral envelopes, coefficient conversions, bandwidth expansion
Who Said That?	Speaker identification with GMMs
Emotion Recognition	TEO vs MFCC features on EmoDB with Random Forest

Function Reference

Functions are grouped by topic, following the same categorisation as the original VOICEBOX documentation. Click any section to expand.

Audio File Input/Output

Read and write a variety of audio file formats: WAV, HTK, SPHERE/TIMIT, AIFF, AU, FLAC, and more.

Function	Description
`v_readwav`	Read a .WAV format sound file
`v_writewav`	Write a .WAV format sound file
`v_readhtk`	Read an HTK parameter file
`v_writehtk`	Write data in HTK format
`v_readsph`	Read a SPHERE/TIMIT format sound file
`v_readaif`	Read a .AIF (AIFF) format sound file
`v_readau`	Read a SUN .AU format sound file
`v_readflac`	Read a .FLAC format sound file
`v_readsfs`	Read a .SFS (Speech Filing System) format sound file
`v_readcnx`	Read a .CNX format sound file

Frequency Scale Conversion

Convert between Hz and perceptual (Mel, Bark, ERB) or musical (Cent, MIDI) frequency scales.

Function	Description
`v_frq2mel`	Convert Hertz to Mel frequency scale
`v_mel2frq`	Convert Mel frequency scale to Hertz
`v_frq2bark`	Convert Hertz to BARK frequency scale
`v_bark2frq`	Convert the BARK frequency scale to Hertz
`v_frq2erb`	Convert Hertz to ERB frequency scale
`v_erb2frq`	Convert ERB frequency scale to Hertz
`v_frq2cent`	Convert Hertz to Cents frequency scale
`v_cent2frq`	Convert Cents frequency scale to Hertz
`v_frq2midi`	Convert frequencies to musical note numbers
`v_midi2frq`	Convert musical note numbers to frequencies

Fourier, DCT and Hartley Transforms

Fast transforms on real data, plus zoom FFT and FFT-based convolution.

Function	Description
`v_rfft`	Calculate the DFT of real data, returning only the first half
`v_irfft`	Inverse FFT of a conjugate symmetric spectrum
`v_rsfft`	FFT of a real symmetric spectrum
`v_zoomfft`	DTFT evaluated over a linear frequency range
`v_rdct`	Discrete cosine transform of real data
`v_irdct`	Inverse discrete cosine transform of real data
`v_rhartley`	Calculate the Hartley transform of real data
`v_convfft`	1-D convolution or correlation using FFT
`v_frac2bin`	Convert a column vector to binary string representation

Random Numbers and Probability

RNGs, multivariate Gaussian mixtures (fit/score/merge/divergence), k-means, and probability densities.

Function	Description
`v_randvec`	Generate random vectors from a GMM distribution
`v_randiscr`	Generate discrete random values with specified probabilities
`v_stdspectrum`	Generate standard acoustic/speech spectra (simplified)
`v_randfilt`	Generate filtered Gaussian noise without initial transient
`v_rnsubset`	Choose k distinct random integers from 1:n
`v_usasi`	Generate USASI noise
`v_gaussmix`	Fit a Gaussian mixture model using EM algorithm
`v_gaussmixb`	Approximate Bhattacharyya divergence between two GMMs
`v_gaussmixd`	Marginal and conditional Gaussian mixture densities
`v_gaussmixg`	Global mean, variance and mode of a GMM (computation only)
`v_gaussmixk`	Approximate KL divergence between two GMMs
`v_gaussmixm`	Estimate mean and variance of the magnitude of a GMM
`v_gaussmixp`	Calculate log probability densities from a Gaussian mixture model
`v_gaussmixt`	Multiply two GMM PDFs
`v_gmmlpdf`	Obsolete wrapper for v_gaussmixp
`v_kmeans`	K-means clustering algorithm
`v_kmeanlbg`	K-means using Linde-Buzo-Gray algorithm
`v_kmeanhar`	K-harmonic means clustering algorithm
`v_lognmpdf`	Calculate PDF of a multivariate lognormal distribution
`v_normcdflog`	Log of normal CDF, accurate for large negative values
`v_chimv`	Approximate mean and variance of non-central chi distribution
`v_vonmisespdf`	Von Mises probability distribution
`v_maxgauss`	Gaussian approximation to the max of a Gaussian vector
`v_berk2prob`	Convert Berksons (log-odds base 2) to probability
`v_prob2berk`	Convert probability to Berksons (log-odds base 2)
`v_gausprod`	Calculate the product of Gaussians
`v_histndim`	Generate an n-dimensional histogram
`v_pdfmoments`	Convert between central moments, raw moments and cumulants
`v_besselratio`	Bessel function ratio I_{v+1}(x)/I_v(x)
`v_besselratioi`	Inverse Bessel function ratio
`v_besratinv0`	Inverse of Modified Bessel Ratio I1(k)/I0(k)

Vector Distance

Spectral distance measures between LPC filter pairs (Itakura, Itakura–Saito, COSH, Euclidean).

Function	Description
`v_disteusq`	Squared Euclidean distance matrix
`v_distitar`	Itakura distance between AR coefficients
`v_distitpf`	Itakura distance between power spectra
`v_distisar`	Itakura-Saito distance between AR coefficients
`v_distispf`	Itakura-Saito distance between power spectra
`v_distchar`	COSH spectral distance between AR coefficients
`v_distchpf`	COSH spectral distance between power spectra

Speech Analysis

Frame-based analysis, spectrograms, pitch trackers, voice activity detection, level measurement, and psychoacoustic metrics.

Function	Description
`v_enframe`	Split signal into (overlapping) frames: one per row
`v_overlapadd`	Join overlapping frames together
`v_fram2wav`	Convert frame values to a continuous waveform
`v_stftw`	Short-time Fourier Transform
`v_istftw`	Inverse Short-time Fourier Transform
`v_filtbankm`	General filterbank matrix (mel/bark/erb/linear)
`v_gammabank`	Gammatone filterbank (stub)
`v_correlogram`	Calculate correlogram
`v_spgrambw`	Spectrogram computation with configurable bandwidth
`v_modspect`	Calculate modulation spectrum of a signal
`v_ewgrpdel`	Energy-weighted group delay waveform
`v_fxpefac`	PEFAC pitch extraction algorithm
`v_fxrapt`	RAPT pitch extraction algorithm
`v_dypsa`	Derive glottal closure instances from speech using the DYPSA algorithm
`v_vadsohn`	Voice activity detector (Sohn et al.)
`v_activlev`	Measure active speech level as per ITU-T P.56
`v_activlevg`	Measure active speech level robustly
`v_earnoise`	Add noise to simulate hearing threshold
`v_ppmvu`	Calculate PPM and VU meter readings (stub)
`v_snrseg`	Measure segmental and global SNR
`v_addnoise`	Add noise at a chosen SNR
`v_sigalign`	Align a clean reference with a noisy signal
`v_txalign`	Find best alignment of two sets of time markers
`v_importsii`	Calculate the SII importance function
`v_phon2sone`	Convert PHON loudness values to SONEs
`v_sone2phon`	Convert SONE loudness values to PHONs
`v_pesq2mos`	Convert PESQ speech quality scores to MOS
`v_mos2pesq`	Convert MOS speech quality scores to PESQ
`v_stoi2prob`	Convert STOI to probability
`v_psycdigit`	Psychoacoustic digit recognition test (stub)
`v_psycest`	Psychoacoustic estimation (stub)
`v_psycestu`	Psychoacoustic estimation utilities (stub)
`v_psychofunc`	Calculate psychometric functions
`v_soundspeed`	Speed of sound, density and impedance of air
`v_sigma`	Estimate glottal opening and closing instants using SIGMA algorithm

LPC Analysis

Autocorrelation and covariance LPC, bandwidth expansion, inverse filtering, stability enforcement, and 60+ conversion routines between every LPC representation (AR coefficients, reflection coefficients, line spectra, cepstra, impulse response, power spectrum, poles/zeros, etc.).

Function	Description
`v_lpcauto`	Perform autocorrelation LPC analysis
`v_lpccovar`	Perform covariance LPC analysis
`v_lpcconv`	Convert between LPC parameter sets (generates conversion string)
`v_lpcbwexp`	Expand formant bandwidths of LPC filter
`v_lpcstable`	Test AR coefficients for stability and stabilize if necessary
`v_lpcifilt`	Apply inverse filter to speech signal
`v_lpcrand`	Generate random stable polynomials
`v_rootstab`	Determine number of polynomial roots outside, inside and on the unit circle
`v_ccwarpf`	Warp cepstral coefficients
`v_lpcar2am`	Convert AR coefficients to AR coefficient matrix
`v_lpcar2cc`	Convert AR filter to complex cepstrum
`v_lpcar2db`	Convert AR coefficients to power spectrum in dB
`v_lpcar2ff`	Convert AR coefficients to complex spectrum
`v_lpcar2fm`	Convert autoregressive coefficients to formant freq+amp+bw
`v_lpcar2im`	Convert AR coefficients to impulse response
`v_lpcar2ls`	Convert AR polynomial to line spectrum pair frequencies
`v_lpcar2pf`	Convert AR coefficients to power spectrum
`v_lpcar2pp`	Convert AR filter to power spectrum polynomial in cos(w)
`v_lpcar2ra`	Convert AR filter to inverse filter autocorrelation coefficients
`v_lpcar2rf`	Convert autoregressive coefficients to reflection coefficients
`v_lpcar2rr`	Convert autoregressive coefficients to autocorrelation coefficients
`v_lpcar2zz`	Convert AR filter to z-plane poles
`v_lpcrf2aa`	Convert reflection coefficients to area function
`v_lpcrf2ao`	Convert reflection coefficients to area ratios
`v_lpcrf2ar`	Convert reflection coefficients to autoregressive coefficients
`v_lpcrf2is`	Convert reflection coefficients to inverse sines
`v_lpcrf2la`	Convert reflection coefficients to log areas
`v_lpcrf2lo`	Convert reflection coefficients to log area ratios
`v_lpcrf2rr`	Convert reflection coefficients to autocorrelation coefficients
`v_lpccc2ar`	Convert complex cepstrum to AR coefficients
`v_lpccc2cc`	Extrapolate complex cepstrum
`v_lpccc2db`	Convert complex cepstrum to dB power spectrum
`v_lpccc2ff`	Convert complex cepstrum to complex spectrum
`v_lpccc2pf`	Convert complex cepstrum to power spectrum
`v_lpcaa2ao`	Convert area function to area ratios
`v_lpcaa2dl`	Convert area coefficients to DCT of log area
`v_lpcaa2rf`	Convert vocal tract areas to reflection coefficients
`v_lpcao2rf`	Convert area ratios to reflection coefficients
`v_lpccw2zz`	Power spectrum roots to LPC poles
`v_lpcdb2pf`	Convert decibel power spectrum to power spectrum
`v_lpcdl2aa`	Convert DCT of log area to area coefficients
`v_lpcff2pf`	Convert complex spectrum to power spectrum
`v_lpcfq2zz`	Convert frequencies and Q factors to z-plane poles
`v_lpcim2ar`	Convert impulse response to AR coefficients
`v_lpcis2rf`	Convert inverse sines to reflection coefficients
`v_lpcla2rf`	Convert log areas to reflection coefficients
`v_lpclo2rf`	Convert log area ratios to reflection coefficients
`v_lpcls2ar`	Convert line spectrum pair frequencies to AR polynomial
`v_lpcpf2cc`	Convert power spectrum to complex cepstrum
`v_lpcpf2ff`	Convert power spectrum to complex spectrum
`v_lpcpf2rr`	Convert power spectrum to autocorrelation coefficients
`v_lpcpp2cw`	Convert power spectrum polynomial to power spectrum zeros
`v_lpcpp2pz`	Convert power spectrum polynomial to power spectrum zeros
`v_lpcpz2zz`	Power spectrum roots to LPC poles
`v_lpcra2ar`	Convert inverse filter autocorrelation coefficients to AR filter
`v_lpcra2pf`	Convert inverse filter autocorrelation to power spectrum
`v_lpcra2pp`	Convert inverse filter autocorrelation to power spectrum polynomial
`v_lpcrr2am`	Convert autocorrelation coefficients to AR coefficient matrix
`v_lpcrr2ar`	Convert autocorrelation coefficients to AR coefficients
`v_lpcss2zz`	Convert s-plane poles to z-plane poles
`v_lpczz2ar`	Convert z-plane poles to AR coefficients
`v_lpczz2cc`	Convert poles to complex cepstrum
`v_lpczz2ss`	Convert z-plane poles to s-plane poles

Speech Synthesis

Text-to-speech and glottal flow models.

Function	Description
`v_sapisynth`	SAPI speech synthesis (stub)
`v_glotros`	Rosenberg glottal model
`v_glotlf`	Liljencrants-Fant glottal model

Speech Enhancement

Noise estimation and single-channel speech enhancement algorithms.

Function	Description
`v_estnoiseg`	Estimate MMSE noise spectrum (Gerkmann & Hendriks)
`v_estnoisem`	Estimate noise spectrum using minimum statistics (Martin)
`v_specsub`	Speech enhancement using spectral subtraction
`v_specsubm`	Spectral subtraction (Martin's method)
`v_spendred`	Speech enhancement using spectral subtraction with decision-directed approach (stub)
`v_ssubmmse`	Speech enhancement using MMSE spectral amplitude estimator
`v_ssubmmsev`	Speech enhancement using MMSE with VAD-based noise estimation

Speech Coding

Companding (A-law, mu-law) and telephone-band filtering.

Function	Description
`v_lin2pcma`	Convert linear PCM to A-law
`v_pcma2lin`	Convert A-law PCM to linear
`v_lin2pcmu`	Convert linear to Mu-law PCM
`v_pcmu2lin`	Convert Mu-law PCM to linear
`v_potsband`	Design 300-3400 Hz telephone bandwidth filter

Speech Recognition & Features

MFCC extraction, mel filterbanks, and Linear Discriminant Analysis.

Function	Description
`v_melcepst`	Calculate the mel cepstrum of a signal
`v_melbankm`	Determine matrix for a mel/erb/bark-spaced filterbank
`v_cep2pow`	Convert cepstral means and variances to the power domain
`v_pow2cep`	Convert power domain means and variances to the cepstral domain
`v_ldatrace`	LDA transform to maximize trace discriminant

Signal Processing

General-purpose DSP: filtering, windows, peak finding, dithering, Teager energy, zero-crossings, resampling.

Function	Description
`v_windows`	Generate a standard windowing function
`v_windinfo`	Window information and figures of merit
`v_filterbank`	Apply a bank of IIR filters to a signal
`v_maxfilt`	Find max of an exponentially weighted sliding window
`v_momfilt`	Calculate moments of a signal using a sliding window
`v_meansqtf`	Mean square transfer function of a filter
`v_resample`	Resample and remove end transients
`v_dlyapsq`	Solve discrete Lyapunov equation in square root form
`v_findpeaks`	Find peaks with optional quadratic interpolation
`v_zerocros`	Find zero crossings in a signal
`v_schmitt`	Pass input signal through a Schmitt trigger
`v_teager`	Calculate Teager energy waveform
`v_ditherq`	Add dither and quantize
`v_nearnonz`	Replace each zero element with nearest non-zero element
`v_rangelim`	Limit the range of matrix elements
`v_horizdiff`	Estimate horizontal difference between two functions
`v_interval`	Classify X values into contiguous intervals
`v_modsym`	Symmetric modulus function
`v_zerotrim`	Remove trailing zero rows and columns

Information Theory

Entropy and Huffman coding.

Function	Description
`v_huffman`	Calculate a D-ary Huffman code
`v_entropy`	Shannon entropy of discrete and sampled continuous distributions

Rotations, Quaternions and Geometry

Conversions between Euler angles, rotation matrices, and quaternions (real and complex), quaternion arithmetic, and 2D/3D geometry primitives.

Function	Description
`v_roteu2qr`	Convert Euler angles to real unit quaternion
`v_rotqr2eu`	Convert real quaternion to Euler angles
`v_roteu2ro`	Convert Euler angles to rotation matrix
`v_rotro2eu`	Convert rotation matrix to Euler angles
`v_rotro2qr`	Convert 3x3 rotation matrix to real quaternion
`v_rotqr2ro`	Convert real quaternion to 3x3 rotation matrix
`v_rotmr2qr`	Convert real quaternion matrices to quaternion vectors
`v_rotqr2mr`	Convert real quaternion vectors to quaternion matrices
`v_rotmc2qc`	Convert complex quaternion matrices to complex quaternion vectors
`v_rotqc2mc`	Convert complex quaternion vectors to complex quaternion matrices
`v_rotqc2qr`	Convert complex quaternion to real quaternion
`v_rotqr2qc`	Convert real quaternion to complex quaternion
`v_rotax2qr`	Convert rotation axis and angle to quaternion
`v_rotqr2ax`	Convert quaternion to rotation axis and angle
`v_rotpl2ro`	Find rotation matrix from plane vectors
`v_rotro2pl`	Find plane and rotation angle of a rotation matrix
`v_rotlu2ro`	Convert look and up directions to rotation matrix
`v_rotro2lu`	Convert rotation matrix to look and up directions
`v_roteucode`	Decode Euler angle rotation code string
`v_rotation`	Encode and decode rotation matrices
`v_rotqrmean`	Calculate mean rotation of quaternion array
`v_rotqrvec`	Rotate vectors by quaternion
`v_qrmult`	Multiply two real quaternion matrices
`v_qrdivide`	Divide two real quaternions
`v_qrdotmult`	Element-wise quaternion multiplication
`v_qrdotdiv`	Element-wise quaternion division
`v_qrabs`	Absolute value and normalization of real quaternions
`v_qrpermute`	Transpose or permute a quaternion array
`v_polygonarea`	Calculate polygon area
`v_polygonwind`	Test if points are inside a polygon
`v_polygonxline`	Find where a line crosses a polygon
`v_minspane`	Minimum spanning tree using Euclidean distance
`v_imagehomog`	Apply homography transformation to an image
`v_rectifyhomog`	Apply rectifying homographies to an image set
`v_skew3d`	Convert between vector and skew-symmetric matrix
`v_upolyhedron`	Calculate uniform polyhedron characteristics
`v_sphrharm`	Forward and inverse spherical harmonic transform (stub)

Printing and Display

Figure export, axis labelling with SI prefixes, colour maps, and figure layout.

Function	Description
`v_fig2pdf`	Save a figure to PDF/EPS/PS format
`v_fig2emf`	Save a figure in various image formats
`v_figbolden`	Embolden, resize and recolour the current figure
`v_axisenlarge`	Enlarge the axes of a figure
`v_tilefigs`	Tile current figure windows
`v_colormap`	Set and create custom color maps
`v_lambda2rgb`	Convert wavelength to XYZ or RGB colour space
`v_xticksi`	Label the x-axis of a plot using SI multipliers
`v_yticksi`	Label the y-axis of a plot using SI multipliers
`v_xyzticksi`	Label an axis of a plot using SI multipliers
`v_xtickint`	Remove non-integer ticks from x-axis
`v_ytickint`	Remove non-integer ticks from y-axis
`v_texthvc`	Write text on graph with specified alignment and colour
`v_cblabel`	Add a label to a colorbar
`v_sprintsi`	Print value with SI multiplier
`v_sprintcpx`	Format a complex number for printing
`v_bitsprec`	Round values to a specified fixed or floating precision

Utility Functions

VOICEBOX configuration, filesystem helpers, numeric helpers, and combinatorics.

Function	Description
`v_voicebox`	Global parameters for Voicebox functions
`v_voicebox_update`	Check for voicebox updates (stub)
`v_paramsetch`	Set parameters for speech processing algorithms (stub)
`v_hostipinfo`	Get host name and IP info using Python equivalents
`v_winenvar`	Read Windows environment variable (stub)
`v_unixwhich`	Search system path for an executable (Python equivalent)
`v_regexfiles`	Find files matching a regular expression pattern
`v_fopenmkd`	Open file, creating directories if needed
`v_finishat`	Print estimated finish time of a long computation
`v_m2htmlpwd`	MATLAB-specific HTML documentation utility (stub)
`v_atan2sc`	Sin and cosine of atan(y/x)
`v_logsum`	Log(sum(k.*exp(x),d)) computed avoiding overflow/underflow
`v_gammalns`	Log of Gamma(x) for positive or negative real x
`v_hypergeom1f1`	Confluent hypergeometric function 1F1 (Kummer's M)
`v_dualdiag`	Simultaneous diagonalization of two Hermitian matrices
`v_mintrace`	Find row permutation to minimize trace
`v_quadpeak`	Find quadratically-interpolated peak in an N-D array
`v_peak2dquad`	Find quadratically-interpolated peak in a 2D array
`v_choosenk`	All choices of K elements from 0:N-1
`v_choosrnk`	All choices of K elements from 0:N-1 with replacement
`v_permutes`	All N! permutations of 0:N-1 + signatures
`v_sort`	Sort with forward and inverse index

Development

pyvoicebox is a from-scratch Python reimplementation of each MATLAB function, verified to produce identical numerical output.

For every function, an Octave harness runs the original MATLAB code with representative inputs and saves the outputs as .mat files. The Python implementation is then compared against these reference values with np.testing.assert_allclose() at tight tolerances (typically rtol=1e-10 to 1e-12).

The repo ships no pre-built reference data. On first test run, tests/conftest.py automatically clones the original sap-voicebox repository, runs the Octave harness scripts, and caches the generated reference .mat files at ~/.cache/pyvoicebox-test/. This means the tests always validate against the real MATLAB source, not stale snapshots.

Every function keeps the same name, argument order, and mode-string conventions as the MATLAB original. MATLAB v_frq2mel(f) becomes Python v_frq2mel(f).

Running the tests

Requires Git, GNU Octave, and Python 3.9+.

pip install -e ".[dev]"
pytest tests/ -v
# First run: clones voicebox source, generates ref data via Octave, runs all tests
# Subsequent runs: uses cached ref data, runs tests only

A note on ergonomics

pyvoicebox is a faithful port — it preserves the original MATLAB function names, argument order, mode strings, and default behaviour. This is intentional: if you're porting MATLAB code or following a paper that references VOICEBOX, everything works the same way.

That said, some MATLAB conventions can feel surprising in Python. For example, v_addnoise(signal, fs, 5) normalises total power to 1 by default — you need the 'k' flag to preserve the original signal level. Single-character mode strings like 'M0dD' are compact but not self-documenting.

We're considering a more Pythonic API layer on top of the faithful port — keyword arguments, sensible defaults, better discoverability. If you have opinions on what that should look like, or if you run into a gotcha that tripped you up, please open an issue.

Acknowledgements

pyvoicebox exists only because of the decades of work by Prof. Mike Brookes and collaborators at the Speech and Audio Processing Lab, Centre for Signal Processing, Department of Electrical and Electronic Engineering, Imperial College London. Every function in this package is a direct port of their MATLAB source, and every algorithm, mode flag, and default parameter traces back to their design choices.

If you use pyvoicebox in academic work, please cite the original VOICEBOX toolbox:

Brookes, M., VOICEBOX: Speech Processing Toolbox for MATLAB, Department of Electrical and Electronic Engineering, Imperial College London, 1997–present. https://github.com/ImperialCollegeLondon/sap-voicebox

This work was supported by the Ministry of Higher Education (MOHE) Malaysia under the Prototype Development Research Grant Scheme (PRGS), Grant No. PRGS25-029-0073.

Upstream resources:

MATLAB source: https://github.com/ImperialCollegeLondon/sap-voicebox
Documentation: https://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

License

pyvoicebox is distributed under the GNU Lesser General Public License v3.0 or later (LGPL-3.0-or-later), matching the upstream MATLAB toolbox. See LICENSE and LICENSE.GPL.

Copyright for the original algorithms and MATLAB implementations remains with Mike Brookes and the contributors listed in each source file.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvoicebox_sap-0.1.0.tar.gz (2.2 MB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyvoicebox_sap-0.1.0-py3-none-any.whl (364.6 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file pyvoicebox_sap-0.1.0.tar.gz.

File metadata

Download URL: pyvoicebox_sap-0.1.0.tar.gz
Upload date: Apr 13, 2026
Size: 2.2 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyvoicebox_sap-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`2092e461ab1b62562b067794d3554e79d99f4ba4f124a8661026838d82be3d6d`
MD5	`888496c640650bca9dc39578088ca862`
BLAKE2b-256	`2393cc6885fdb76f664caeb598be72a7bc2fed58aa1e4a7a03fca84441d5a440`

See more details on using hashes here.

File details

Details for the file pyvoicebox_sap-0.1.0-py3-none-any.whl.

File metadata

Download URL: pyvoicebox_sap-0.1.0-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 364.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pyvoicebox_sap-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7668757cbad13553ac2073fe3aeb853e10b45b058602635fb32c2244fdfbc830`
MD5	`3ed68efef7e6ef8d874365d796721fb4`
BLAKE2b-256	`4b98842a7e7efac830a0c96d5f9bcde09030ab9b0877c3135ec11ce8e8e19f8c`

See more details on using hashes here.

pyvoicebox-sap 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

What is VOICEBOX?

How does it compare to librosa and openSMILE?

Installation

Quick Start

Notebooks

Function Reference

Development

Running the tests

A note on ergonomics

Acknowledgements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes