Pytest plugin for STT/TTS integration testing with httpx, metrics, and embedded audio samples.
Project description
pytest-audioeval
Pytest plugin for STT/TTS integration testing. Built on the httpx ecosystem (httpx, httpx-ws, httpx-sse) with built-in metrics, embedded ground-truth audio samples, and chainable assertions.
Features
- STT via WebSocket —
audioeval.stt.ws()streams audio, collects transcription - TTS via HTTP —
audioeval.tts.post()batch,.stream()chunked,.sse()Server-Sent Events - Text metrics — WER, CER, substitutions, insertions, deletions (via
jiwer) - Audio metrics — PESQ MOS 1–5 scale (via
pesq) - Embedded samples — ground-truth audio + reference text pairs, multi-language ready
- Chainable assertions —
result.compute_metrics(ref).assert_quality(max_wer=0.2) - CLI thresholds —
--audioeval-wer,--audioeval-cer,--audioeval-mos
Install
uv add pytest-audioeval
Quick Start
STT — WebSocket
import asyncio
import uuid
import orjson as json
from pytest_audioeval.client import AudioEval
async def test_user_stt_ws(audioeval: AudioEval) -> None:
sample = audioeval.samples.en_hello_world
async with audioeval.stt.ws(sample=sample) as session:
config = json.dumps(
{"uid": str(uuid.uuid4()), "language": "en", "task": "transcribe",
"model": "large-v3-turbo", "use_vad": True}
).decode()
await session.send_text(config)
ready = await session.receive_text()
assert "SERVER_READY" in ready
await session.send_sample(sample, chunk_ms=200)
await asyncio.sleep(2)
await session.send_text("END_OF_AUDIO")
# Collect transcription segments...
TTS — Batch POST
import io
import soundfile as sf
from pytest_audioeval.client import AudioEval
async def test_user_tts_batch(audioeval: AudioEval) -> None:
response = await audioeval.tts.post(
json={"input": "Hello world.", "model": "kokoro",
"voice": "af_heart", "response_format": "wav", "stream": False},
)
data, rate = sf.read(io.BytesIO(response.content), dtype="float32")
assert rate == 24_000
assert len(data) > 0
TTS — Chunked Streaming
async def test_user_tts_streaming(audioeval: AudioEval) -> None:
chunks = []
async with audioeval.tts.stream(json={"input": "Hello.", ...}) as response:
async for chunk in response.aiter_bytes():
chunks.append(chunk)
assert len(chunks) > 0
TTS — Server-Sent Events
async def test_user_tts_sse(audioeval: AudioEval) -> None:
async with audioeval.tts.sse(json={"input": "Hello.", ...}) as event_source:
async for sse in event_source.aiter_sse():
print(sse.data)
Text Metrics
from pytest_audioeval.metrics.text import TextMetrics
async def test_user_metrics_text() -> None:
metrics = TextMetrics.compute(
reference="the quick brown fox jumps over the lazy dog",
hypothesis="the quick brown fox jumps over the lazy dock",
)
assert metrics.wer < 0.15
assert metrics.substitutions == 1
STT Result — Chainable Assertions
from pytest_audioeval.stt import STTResult
async def test_user_stt_result() -> None:
result = STTResult(hypothesis_text="Hello world.")
result.compute_metrics("Hello world.")
result.assert_quality(max_wer=0.2, max_cer=0.15)
Sample Registry
from pytest_audioeval.samples.registry import SampleLang
async def test_user_samples_browse(audioeval: AudioEval) -> None:
# All samples
assert len(audioeval.samples) >= 3
# Filter by language
en_samples = audioeval.samples.by_lang(SampleLang.EN)
# Attribute access: {lang}_{name}
sample = audioeval.samples.en_hello_world
assert sample.reference_text == "Hello world."
# Audio access
audio_f32 = sample.audio_numpy() # numpy float32 array
audio_raw = sample.audio_bytes() # raw bytes
chunks = sample.chunks(chunk_ms=200) # chunked for streaming
CLI Thresholds
async def test_user_thresholds(audioeval_thresholds: dict[str, float]) -> None:
assert audioeval_thresholds["max_wer"] == 0.2
assert audioeval_thresholds["max_cer"] == 0.15
assert audioeval_thresholds["min_mos"] == 3.0
CLI Options
pytest --stt-url=ws://localhost:45120 --tts-url=http://localhost:45130/v1/audio/speech
pytest --audioeval-wer=0.15 --audioeval-cer=0.10 --audioeval-mos=3.5
| Option | Default | Description |
|---|---|---|
--stt-url |
None |
STT service WebSocket URL |
--tts-url |
None |
TTS service HTTP URL |
--audioeval-wer |
0.2 |
Max WER threshold |
--audioeval-cer |
0.15 |
Max CER threshold |
--audioeval-mos |
3.0 |
Min PESQ MOS threshold |
Fixtures
| Fixture | Scope | Type | Description |
|---|---|---|---|
audioeval |
session | AudioEval |
Main facade — audioeval.stt, audioeval.tts, audioeval.samples |
audioeval_thresholds |
function | dict[str, float] |
CLI-driven threshold dict |
Architecture
src/pytest_audioeval/
├── plugin.py # pytest entry point (fixtures, CLI options)
├── client.py # AudioEval facade
├── stt.py # STTClient (httpx-ws), STTSession, STTResult
├── tts.py # TTSClient (httpx + httpx-sse)
├── metrics/
│ ├── text.py # TextMetrics — WER, CER via jiwer
│ └── audio.py # AudioMetrics — PESQ MOS via pesq
└── samples/
├── registry.py # SampleRegistry + AudioSample + SampleLang
└── audio/en/ # Embedded ground-truth WAV + TXT pairs
Clients
| Client | Transport | Methods |
|---|---|---|
STTClient |
httpx-ws |
.ws() — WebSocket context manager yielding STTSession |
TTSClient |
httpx + httpx-sse |
.post() batch, .stream() chunked, .sse() SSE |
Metrics
| Metric | Class | Source | Range |
|---|---|---|---|
| Word Error Rate (WER) | TextMetrics |
jiwer |
0.0 – 1.0+ |
| Character Error Rate (CER) | TextMetrics |
jiwer |
0.0 – 1.0+ |
| Substitutions / Insertions / Deletions | TextMetrics |
jiwer |
0 – N |
| PESQ MOS | AudioMetrics |
pesq |
1.0 – 5.0 |
Samples
Embedded ground-truth audio with reference transcriptions:
samples/audio/
└── en/ # English (16kHz, float32)
├── hello_world.wav # "Hello world."
├── quick_brown_fox.wav
└── counting.wav # "One, two, three, four, five."
Access: audioeval.samples.en_hello_world, audioeval.samples.en_counting, etc.
Infrastructure
Integration tests require GPU-accelerated TTS/STT services:
make infra-up # Start TTS (Kokoro) + STT (WhisperLive)
make infra-status # Check health
make infra-logs # View logs
make infra-down # Stop services
| Service | Image | Port | Protocol |
|---|---|---|---|
| TTS (Kokoro) | ghcr.io/remsky/kokoro-fastapi-gpu |
45130 |
HTTP |
| STT (WhisperLive) | ghcr.io/collabora/whisperlive-gpu |
45120 |
WebSocket |
Development
make install # uv sync --dev
make lint # ruff check + format
make test-unit # unit tests (no services)
make test-integration # integration tests (requires services)
make coverage # coverage report (>90%)
Requirements
- Python >= 3.13
- NVIDIA GPU + Docker with nvidia-container-toolkit (for integration tests)
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_audioeval-0.1.3.tar.gz.
File metadata
- Download URL: pytest_audioeval-0.1.3.tar.gz
- Upload date:
- Size: 4.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e37654bc869b8fbc092f17bf826ece8421601f3f3de9f8fe9b23c6dfd2989011
|
|
| MD5 |
b49d42344525324e9027408e287a530f
|
|
| BLAKE2b-256 |
493594206d6312efb1049c9993f354e261e33540b55303d2c6bc4bc879845341
|
File details
Details for the file pytest_audioeval-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pytest_audioeval-0.1.3-py3-none-any.whl
- Upload date:
- Size: 4.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b97369347232d43180dd9860189c547ccc8dedb42b7c21d511ddc5a05ef7ecc
|
|
| MD5 |
ab147c7a06c1e19f66a23b845e967880
|
|
| BLAKE2b-256 |
6d3fbbe9ff3c5aa09abfebdda827cf7c98ec7c8ae030043ba3dc79a03415fea1
|