Skip to main content

Pytest plugin for STT/TTS integration testing with httpx, metrics, and embedded audio samples.

Project description

pytest-audioeval

Pytest plugin for STT/TTS integration testing. Built on the httpx ecosystem (httpx, httpx-ws, httpx-sse) with built-in metrics, embedded ground-truth audio samples, and chainable assertions.

Features

  • STT via WebSocketaudioeval.stt.ws() streams audio, collects transcription
  • TTS via HTTPaudioeval.tts.post() batch, .stream() chunked, .sse() Server-Sent Events
  • Text metrics — WER, CER, substitutions, insertions, deletions (via jiwer)
  • Audio metrics — PESQ MOS 1–5 scale (via pesq)
  • Embedded samples — ground-truth audio + reference text pairs, multi-language ready
  • Chainable assertionsresult.compute_metrics(ref).assert_quality(max_wer=0.2)
  • CLI thresholds--audioeval-wer, --audioeval-cer, --audioeval-mos

Install

uv add pytest-audioeval

Quick Start

STT — WebSocket

import asyncio
import uuid
import orjson as json
from pytest_audioeval.client import AudioEval


async def test_user_stt_ws(audioeval: AudioEval) -> None:
    sample = audioeval.samples.en_hello_world

    async with audioeval.stt.ws(sample=sample) as session:
        config = json.dumps(
            {"uid": str(uuid.uuid4()), "language": "en", "task": "transcribe",
             "model": "large-v3-turbo", "use_vad": True}
        ).decode()
        await session.send_text(config)

        ready = await session.receive_text()
        assert "SERVER_READY" in ready

        await session.send_sample(sample, chunk_ms=200)
        await asyncio.sleep(2)
        await session.send_text("END_OF_AUDIO")

        # Collect transcription segments...

TTS — Batch POST

import io
import soundfile as sf
from pytest_audioeval.client import AudioEval


async def test_user_tts_batch(audioeval: AudioEval) -> None:
    response = await audioeval.tts.post(
        json={"input": "Hello world.", "model": "kokoro",
              "voice": "af_heart", "response_format": "wav", "stream": False},
    )
    data, rate = sf.read(io.BytesIO(response.content), dtype="float32")
    assert rate == 24_000
    assert len(data) > 0

TTS — Chunked Streaming

async def test_user_tts_streaming(audioeval: AudioEval) -> None:
    chunks = []
    async with audioeval.tts.stream(json={"input": "Hello.", ...}) as response:
        async for chunk in response.aiter_bytes():
            chunks.append(chunk)
    assert len(chunks) > 0

TTS — Server-Sent Events

async def test_user_tts_sse(audioeval: AudioEval) -> None:
    async with audioeval.tts.sse(json={"input": "Hello.", ...}) as event_source:
        async for sse in event_source.aiter_sse():
            print(sse.data)

Text Metrics

from pytest_audioeval.metrics.text import TextMetrics


async def test_user_metrics_text() -> None:
    metrics = TextMetrics.compute(
        reference="the quick brown fox jumps over the lazy dog",
        hypothesis="the quick brown fox jumps over the lazy dock",
    )
    assert metrics.wer < 0.15
    assert metrics.substitutions == 1

STT Result — Chainable Assertions

from pytest_audioeval.stt import STTResult


async def test_user_stt_result() -> None:
    result = STTResult(hypothesis_text="Hello world.")
    result.compute_metrics("Hello world.")
    result.assert_quality(max_wer=0.2, max_cer=0.15)

Sample Registry

from pytest_audioeval.samples.registry import SampleLang


async def test_user_samples_browse(audioeval: AudioEval) -> None:
    # All samples
    assert len(audioeval.samples) >= 3

    # Filter by language
    en_samples = audioeval.samples.by_lang(SampleLang.EN)

    # Attribute access: {lang}_{name}
    sample = audioeval.samples.en_hello_world
    assert sample.reference_text == "Hello world."

    # Audio access
    audio_f32 = sample.audio_numpy()        # numpy float32 array
    audio_raw = sample.audio_bytes()         # raw bytes
    chunks = sample.chunks(chunk_ms=200)     # chunked for streaming

CLI Thresholds

async def test_user_thresholds(audioeval_thresholds: dict[str, float]) -> None:
    assert audioeval_thresholds["max_wer"] == 0.2
    assert audioeval_thresholds["max_cer"] == 0.15
    assert audioeval_thresholds["min_mos"] == 3.0

CLI Options

pytest --stt-url=ws://localhost:45120 --tts-url=http://localhost:45130/v1/audio/speech
pytest --audioeval-wer=0.15 --audioeval-cer=0.10 --audioeval-mos=3.5
Option Default Description
--stt-url None STT service WebSocket URL
--tts-url None TTS service HTTP URL
--audioeval-wer 0.2 Max WER threshold
--audioeval-cer 0.15 Max CER threshold
--audioeval-mos 3.0 Min PESQ MOS threshold

Fixtures

Fixture Scope Type Description
audioeval session AudioEval Main facade — audioeval.stt, audioeval.tts, audioeval.samples
audioeval_thresholds function dict[str, float] CLI-driven threshold dict

Architecture

src/pytest_audioeval/
├── plugin.py              # pytest entry point (fixtures, CLI options)
├── client.py              # AudioEval facade
├── stt.py                 # STTClient (httpx-ws), STTSession, STTResult
├── tts.py                 # TTSClient (httpx + httpx-sse)
├── metrics/
│   ├── text.py            # TextMetrics — WER, CER via jiwer
│   └── audio.py           # AudioMetrics — PESQ MOS via pesq
└── samples/
    ├── registry.py        # SampleRegistry + AudioSample + SampleLang
    └── audio/en/          # Embedded ground-truth WAV + TXT pairs

Clients

Client Transport Methods
STTClient httpx-ws .ws() — WebSocket context manager yielding STTSession
TTSClient httpx + httpx-sse .post() batch, .stream() chunked, .sse() SSE

Metrics

Metric Class Source Range
Word Error Rate (WER) TextMetrics jiwer 0.0 – 1.0+
Character Error Rate (CER) TextMetrics jiwer 0.0 – 1.0+
Substitutions / Insertions / Deletions TextMetrics jiwer 0 – N
PESQ MOS AudioMetrics pesq 1.0 – 5.0

Samples

Embedded ground-truth audio with reference transcriptions:

samples/audio/
└── en/                    # English (16kHz, float32)
    ├── hello_world.wav    # "Hello world."
    ├── quick_brown_fox.wav
    └── counting.wav       # "One, two, three, four, five."

Access: audioeval.samples.en_hello_world, audioeval.samples.en_counting, etc.

Infrastructure

Integration tests require GPU-accelerated TTS/STT services:

make infra-up       # Start TTS (Kokoro) + STT (WhisperLive)
make infra-status   # Check health
make infra-logs     # View logs
make infra-down     # Stop services
Service Image Port Protocol
TTS (Kokoro) ghcr.io/remsky/kokoro-fastapi-gpu 45130 HTTP
STT (WhisperLive) ghcr.io/collabora/whisperlive-gpu 45120 WebSocket

Development

make install            # uv sync --dev
make lint               # ruff check + format
make test-unit          # unit tests (no services)
make test-integration   # integration tests (requires services)
make coverage           # coverage report (>90%)

Requirements

  • Python >= 3.13
  • NVIDIA GPU + Docker with nvidia-container-toolkit (for integration tests)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_audioeval-0.1.0.tar.gz (467.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_audioeval-0.1.0-py3-none-any.whl (398.1 kB view details)

Uploaded Python 3

File details

Details for the file pytest_audioeval-0.1.0.tar.gz.

File metadata

  • Download URL: pytest_audioeval-0.1.0.tar.gz
  • Upload date:
  • Size: 467.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytest_audioeval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 92f69c2f720b3c970aca31e0a1fc22f0be9cb57425b5b2054f572ec7a72e1f71
MD5 eda30cb91b25bb4cdcee7a52cba5e443
BLAKE2b-256 ee98619211f7621cce6f2ded98a2a4eb1cec74504738c5ea1fcfa2717a6e12f8

See more details on using hashes here.

File details

Details for the file pytest_audioeval-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pytest_audioeval-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 398.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytest_audioeval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f75535828ec22ff0c00e7867aa883624aa589cf9283155f597b358fc1733bb76
MD5 7ff6038a9d3b5f818eb6bd7ef8b87b9d
BLAKE2b-256 360781834fa51148e01bfdd55210e84a08cf931db1d8c5bc3fb74a46a8b34429

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page