ASR-LE: Latency + Alignment + Error Attribution Engine for Speech Models

These details have not been verified by PyPI

Project description

ASR-LE (Automatic Speech Recognition – Latency & Error Explorer)

ASR-LE is an advanced ASR evaluation toolkit that goes beyond WER by adding time-aware analysis:

Word/token timelines (timestamps + confidence)
Streaming latency simulation (chunking/overlap/right-context)
Word-level error attribution (sub/ins/del bursts by time window)
“Moments”: automatically surfaces the worst error windows so you can jump directly to problem segments
Backend contract tests so community backends must meet the same interface & quality gates
Streamlit dashboard for exploration, comparisons, and batch runs

Think of it as: “perf + quality observability for ASR pipelines”.

Key Features
Installation
Quickstart (Streamlit)
Quickstart (CLI)
How the Analysis Works
Dataset Runner (Batch)
- Manifest Format
- What You Get
Backend System
Docker
CI/CD
Troubleshooting
Contributing
License

Key Features

1) True ASR Observability (not just one scalar)

ASR-LE produces a run folder containing:

report.json (machine-readable)
report.md (human-readable)
artifacts/ exports for visualization (timeline bins, moments, tokens, etc.)

You can compare runs, batch runs, and analyze worst segments quickly.

2) Token-level introspection: confidence + timestamps

For backends that expose tokens (e.g., faster-whisper), you can inspect:

per-token word, start_s, end_s
confidence (when available)

Even when your alignment-based heatmap is missing, the Streamlit app can build a confidence heatmap from tokens as a fallback.

3) Streaming p95 first-word latency estimator

ASR-LE can simulate streaming by chunking the audio and measuring:

first decoded word time
p50/p95 across repeated simulated runs

This helps you answer real production questions like:

“Which knobs buy the biggest p95 improvements without retraining?”

4) Word-level error attribution with time windows

When reference text is provided, ASR-LE can compute:

WER (sub/ins/del/hits)
attribution bursts: where errors occur in time
timeline bins: e.g. each 1 second window gets counts of sub/ins/del

This enables targeted debugging (noise bursts, far-field reverberation zones, etc.).

5) Error “moments”

ASR-LE auto-detects the worst 1s windows (with padding) and stores them as moments so the dashboard can jump directly.

6) Backend contract tests

Community backends must satisfy baseline correctness and shape requirements via a minimal contract.

Installation

PyPI (Recommended)

pip install asrle

Optional extras (recommended) If your package defines extras, install like:

pip install "asrle[whisper,alignment]"

Typical extras:

whisper: faster-whisper / whisper backend deps
alignment: transformers + torchaudio forced alignment deps

Minimal

python -m venv .venv
# Windows PowerShell:
. .\.venv\Scripts\Activate.ps1

pip install -U pip
pip install -e .

With Whisper Backends

Depending on your repo extras, you may expose extras like [whisper]. If not, install typical deps manually:

pip install faster-whisper
pip install openai-whisper   # optional, HF whisper alternative
pip install ffmpeg-python    # if needed

Also ensure FFmpeg is available:

Windows: install FFmpeg and add to PATH
Linux: sudo apt-get install ffmpeg

With Alignment (CTC forced alignment)

Alignment can use HuggingFace + torchaudio forced alignment when available:

pip install transformers torchaudio

Quickstart (Streamlit)

Start the dashboard:

cd C:\path\to\asrle
streamlit run .\src\asrle\dashboard\streamlit_app.py

Open:

http://localhost:8501

Single Run Workflow

Upload audio (or server path)
Pick backend (e.g. faster-whisper)
Optionally paste/upload reference transcript
Enable:
- Word alignment (CTC) (for timestamped attribution)
- Word-level attribution (moments + bins)
- Streaming simulation (optional)
Run analysis

What to expect in UI

WER (if reference provided)
Latency p95 (estimated using repeats)
Streaming first-word latency p95 (streaming mode)
Tokens/confidence explorer (if backend provides tokens)
Error heatmap (if timeline artifacts exist)

If you don’t see the error heatmap: it means artifacts/timeline.json wasn’t created. This usually happens when CTC alignment fails or reference wasn’t provided.

Quickstart (CLI)

If your repo exposes a CLI entrypoint, you can add examples here. A minimal canonical pattern:

python -m asrle <command> ...

If you don’t have a CLI command yet, Streamlit is the fastest interface.

How the Analysis Works

WER

WER is computed word-level using:

jiwer.process_words() when available (preferred)
else a Levenshtein fallback

Outputs include:

wer
substitutions, insertions, deletions, hits
ref_words, hyp_words

Tokens & Confidence

Some backends produce per-word tokens with timestamps and confidence. ASR-LE exposes them in:

report.json -> transcript -> segments[*] -> tokens[*]
and Streamlit provides a token table + confidence summaries.

Streaming p95 First-Word Latency

With streaming enabled, ASR-LE:

chunks audio into overlapping blocks
runs backend decode loop multiple times (repeats)
estimates p50/p95 (first_word_latency_s) and stores percentiles

Word-Level Error Attribution & Timeline Heatmap

To build time-aware substitution windows, ASR-LE needs:

reference text
hypothesis timestamps (from backend tokens)
reference word timestamps (from CTC alignment)

If CTC alignment fails (e.g. produces <unk>/garbage), then timeline bins may be missing.

Artifacts:

artifacts/word_attribution.json
artifacts/timeline.json

Error Moments

Moments are the worst time windows by error density, saved in:

artifacts/error_moments.json

The UI uses these to jump straight to problematic spans.

Timestamp Drift Checks

ASR-LE can run a drift check against transcript timestamps to detect:

non-monotonic segments
overlaps, gaps, impossible ordering

Dataset Runner (Batch)

You can run a dataset by uploading a manifest CSV in the dashboard.

Manifest Format

Required columns:

audio_path

Optional:

ref_text (inline reference)
ref_path (path to a .txt file reference)
any metadata columns (SNR, device, far-field, noise_type, etc.)

Example:

audio_path,ref_text,snr,far_field
C:\data\a.wav,"hello world",20,false
C:\data\b.wav,"this is a test",5,true

What You Get

The dataset run creates:

runs/dataset_<id>/
  manifest.csv
  dataset_summary.json
  items/
    item_00000/
      report.json
      artifacts/...
    item_00001/
      ...

The UI can summarize:

WER distribution (mean/p50/p90)
latency p50/p95
first-word latency p50/p95 (streaming mode)

Backend System

Backends Included

Typical set:

dummy (testing)
hf-whisper (transformers pipeline)
faster-whisper (high-performance Whisper)

Backend Contract Tests

The Backend Validator page runs checks like:

does transcribe return required fields?
timestamps monotonic?
streaming capability validation (if claimed)

Streaming Interface (Optional)

Some backends can support true incremental decoding. If a backend supports it, ASR-LE can validate and use it.

Docker

Build and run:

docker build -t asrle:local .
docker run --rm -p 8501:8501 -v "$(pwd)/runs:/app/runs" asrle:local

Then open:

http://localhost:8501

CI/CD

GitHub Actions included:

.github/workflows/ci.yml – tests + lint/format checks (best-effort)
.github/workflows/docker.yml – builds and pushes to GHCR on main/tags
.github/workflows/release.yml – PyPI Trusted Publisher

Troubleshooting

1) “Heatmap is missing”

The error heatmap relies on artifacts/timeline.json. If it doesn’t exist:

Provide reference text
Enable word attribution
Enable word alignment (CTC)
Ensure alignment deps are installed (transformers, torchaudio)
Check CTC didn’t collapse into <unk> outputs (alignment failure)

2) WER = 0 even with noisy audio

WER uses the reference transcript. If your reference equals the hypothesis after normalization, WER can still be 0. Verify:

the reference text is correct and not accidentally identical
your normalization is not overly aggressive
backend isn’t outputting identical transcript due to VAD trimming or normalization

3) No tokens/confidence showing

Not all backends return token-level outputs. Use faster-whisper and ensure your backend exposes tokens into: transcript.segments[*].tokens.

4) FFmpeg errors

Install FFmpeg and ensure it is in PATH.

Contributing

Contributions are welcome:

add a backend (must pass contract tests)
improve alignment robustness
add dashboards / better visualizations
improve streaming accuracy and incremental decoding support

Recommended dev flow:

Create a feature branch
Add tests or a minimal reproducible case
Ensure CI passes
Open a PR

License

This project is licensed under MIT License.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Nov 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asrle-0.2.0.tar.gz (51.8 kB view details)

Uploaded Nov 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

asrle-0.2.0-py3-none-any.whl (66.1 kB view details)

Uploaded Nov 28, 2025 Python 3

File details

Details for the file asrle-0.2.0.tar.gz.

File metadata

Download URL: asrle-0.2.0.tar.gz
Upload date: Nov 28, 2025
Size: 51.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for asrle-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`75e5ac6f9ad55c3f7be29414d1804370abb0dd1075463e85a1979e590c9b3a0e`
MD5	`7bf37ed5881a87d97c6586e35f3a200f`
BLAKE2b-256	`b0f166fd2f5016dfbae5392176d8869fdd0059253cee063cbea0268db376db20`

See more details on using hashes here.

Provenance

The following attestation bundles were made for asrle-0.2.0.tar.gz:

Publisher: release.yml on abdulvahapmutlu/asrle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: asrle-0.2.0.tar.gz
- Subject digest: 75e5ac6f9ad55c3f7be29414d1804370abb0dd1075463e85a1979e590c9b3a0e
- Sigstore transparency entry: 731028610
- Sigstore integration time: Nov 28, 2025
Source repository:
- Permalink: abdulvahapmutlu/asrle@d40770947532ef1ff6d9254745bfb638f9fd69e5
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/abdulvahapmutlu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d40770947532ef1ff6d9254745bfb638f9fd69e5
- Trigger Event: push

File details

Details for the file asrle-0.2.0-py3-none-any.whl.

File metadata

Download URL: asrle-0.2.0-py3-none-any.whl
Upload date: Nov 28, 2025
Size: 66.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for asrle-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c0e257972a87a65cea83dda9805dea08d102c9267c3a481a098123abfbbdda49`
MD5	`56627ad6d722393ed47e159a4ed2f7d1`
BLAKE2b-256	`f6fd1f59af168d3845eb7752c608c0c45858db595462ccd9373e57a01527b680`

See more details on using hashes here.

Provenance

The following attestation bundles were made for asrle-0.2.0-py3-none-any.whl:

Publisher: release.yml on abdulvahapmutlu/asrle

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: asrle-0.2.0-py3-none-any.whl
- Subject digest: c0e257972a87a65cea83dda9805dea08d102c9267c3a481a098123abfbbdda49
- Sigstore transparency entry: 731028613
- Sigstore integration time: Nov 28, 2025
Source repository:
- Permalink: abdulvahapmutlu/asrle@d40770947532ef1ff6d9254745bfb638f9fd69e5
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/abdulvahapmutlu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d40770947532ef1ff6d9254745bfb638f9fd69e5
- Trigger Event: push

asrle 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

ASR-LE (Automatic Speech Recognition – Latency & Error Explorer)

Table of Contents

Key Features

1) True ASR Observability (not just one scalar)

2) Token-level introspection: confidence + timestamps

3) Streaming p95 first-word latency estimator

4) Word-level error attribution with time windows

5) Error “moments”

6) Backend contract tests

Installation

PyPI (Recommended)

Minimal

With Whisper Backends

With Alignment (CTC forced alignment)

Quickstart (Streamlit)

Single Run Workflow

What to expect in UI

Quickstart (CLI)

How the Analysis Works

WER

Tokens & Confidence

Streaming p95 First-Word Latency

Word-Level Error Attribution & Timeline Heatmap

Error Moments

Timestamp Drift Checks

Dataset Runner (Batch)

Manifest Format

What You Get

Backend System

Backends Included

Backend Contract Tests

Streaming Interface (Optional)

Docker

CI/CD

Troubleshooting

1) “Heatmap is missing”

2) WER = 0 even with noisy audio

3) No tokens/confidence showing

4) FFmpeg errors

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance