ASR-LE: Latency + Alignment + Error Attribution Engine for Speech Models
Project description
ASR-LE (Automatic Speech Recognition – Latency & Error Explorer)
ASR-LE is an advanced ASR evaluation toolkit that goes beyond WER by adding time-aware analysis:
- Word/token timelines (timestamps + confidence)
- Streaming latency simulation (chunking/overlap/right-context)
- Word-level error attribution (sub/ins/del bursts by time window)
- “Moments”: automatically surfaces the worst error windows so you can jump directly to problem segments
- Backend contract tests so community backends must meet the same interface & quality gates
- Streamlit dashboard for exploration, comparisons, and batch runs
Think of it as: “perf + quality observability for ASR pipelines”.
Table of Contents
- Key Features
- Installation
- Quickstart (Streamlit)
- Quickstart (CLI)
- How the Analysis Works
- Dataset Runner (Batch)
- Backend System
- Docker
- CI/CD
- Troubleshooting
- Contributing
- License
Key Features
1) True ASR Observability (not just one scalar)
ASR-LE produces a run folder containing:
report.json(machine-readable)report.md(human-readable)artifacts/exports for visualization (timeline bins, moments, tokens, etc.)
You can compare runs, batch runs, and analyze worst segments quickly.
2) Token-level introspection: confidence + timestamps
For backends that expose tokens (e.g., faster-whisper), you can inspect:
- per-token
word,start_s,end_s confidence(when available)
Even when your alignment-based heatmap is missing, the Streamlit app can build a confidence heatmap from tokens as a fallback.
3) Streaming p95 first-word latency estimator
ASR-LE can simulate streaming by chunking the audio and measuring:
- first decoded word time
- p50/p95 across repeated simulated runs
This helps you answer real production questions like:
“Which knobs buy the biggest p95 improvements without retraining?”
4) Word-level error attribution with time windows
When reference text is provided, ASR-LE can compute:
- WER (sub/ins/del/hits)
- attribution bursts: where errors occur in time
- timeline bins: e.g. each 1 second window gets counts of sub/ins/del
This enables targeted debugging (noise bursts, far-field reverberation zones, etc.).
5) Error “moments”
ASR-LE auto-detects the worst 1s windows (with padding) and stores them as moments so the dashboard can jump directly.
6) Backend contract tests
Community backends must satisfy baseline correctness and shape requirements via a minimal contract.
Installation
PyPI (Recommended)
pip install asrle
Optional extras (recommended) If your package defines extras, install like:
pip install "asrle[whisper,alignment]"
Typical extras:
-
whisper: faster-whisper / whisper backend deps
-
alignment: transformers + torchaudio forced alignment deps
Minimal
python -m venv .venv
# Windows PowerShell:
. .\.venv\Scripts\Activate.ps1
pip install -U pip
pip install -e .
With Whisper Backends
Depending on your repo extras, you may expose extras like [whisper]. If not, install typical deps manually:
pip install faster-whisper
pip install openai-whisper # optional, HF whisper alternative
pip install ffmpeg-python # if needed
Also ensure FFmpeg is available:
- Windows: install FFmpeg and add to PATH
- Linux:
sudo apt-get install ffmpeg
With Alignment (CTC forced alignment)
Alignment can use HuggingFace + torchaudio forced alignment when available:
pip install transformers torchaudio
Quickstart (Streamlit)
Start the dashboard:
cd C:\path\to\asrle
streamlit run .\src\asrle\dashboard\streamlit_app.py
Open:
Single Run Workflow
-
Upload audio (or server path)
-
Pick backend (e.g.
faster-whisper) -
Optionally paste/upload reference transcript
-
Enable:
- Word alignment (CTC) (for timestamped attribution)
- Word-level attribution (moments + bins)
- Streaming simulation (optional)
-
Run analysis
What to expect in UI
- WER (if reference provided)
- Latency p95 (estimated using repeats)
- Streaming first-word latency p95 (streaming mode)
- Tokens/confidence explorer (if backend provides tokens)
- Error heatmap (if timeline artifacts exist)
If you don’t see the error heatmap: it means
artifacts/timeline.jsonwasn’t created. This usually happens when CTC alignment fails or reference wasn’t provided.
Quickstart (CLI)
If your repo exposes a CLI entrypoint, you can add examples here. A minimal canonical pattern:
python -m asrle <command> ...
If you don’t have a CLI command yet, Streamlit is the fastest interface.
How the Analysis Works
WER
WER is computed word-level using:
jiwer.process_words()when available (preferred)- else a Levenshtein fallback
Outputs include:
wersubstitutions,insertions,deletions,hitsref_words,hyp_words
Tokens & Confidence
Some backends produce per-word tokens with timestamps and confidence. ASR-LE exposes them in:
report.json -> transcript -> segments[*] -> tokens[*]- and Streamlit provides a token table + confidence summaries.
Streaming p95 First-Word Latency
With streaming enabled, ASR-LE:
- chunks audio into overlapping blocks
- runs backend decode loop multiple times (
repeats) - estimates p50/p95
(first_word_latency_s)and stores percentiles
Word-Level Error Attribution & Timeline Heatmap
To build time-aware substitution windows, ASR-LE needs:
- reference text
- hypothesis timestamps (from backend tokens)
- reference word timestamps (from CTC alignment)
If CTC alignment fails (e.g. produces <unk>/garbage), then timeline bins may be missing.
Artifacts:
artifacts/word_attribution.jsonartifacts/timeline.json
Error Moments
Moments are the worst time windows by error density, saved in:
artifacts/error_moments.json
The UI uses these to jump straight to problematic spans.
Timestamp Drift Checks
ASR-LE can run a drift check against transcript timestamps to detect:
- non-monotonic segments
- overlaps, gaps, impossible ordering
Dataset Runner (Batch)
You can run a dataset by uploading a manifest CSV in the dashboard.
Manifest Format
Required columns:
audio_path
Optional:
ref_text(inline reference)ref_path(path to a .txt file reference)- any metadata columns (SNR, device, far-field, noise_type, etc.)
Example:
audio_path,ref_text,snr,far_field
C:\data\a.wav,"hello world",20,false
C:\data\b.wav,"this is a test",5,true
What You Get
The dataset run creates:
runs/dataset_<id>/
manifest.csv
dataset_summary.json
items/
item_00000/
report.json
artifacts/...
item_00001/
...
The UI can summarize:
- WER distribution (mean/p50/p90)
- latency p50/p95
- first-word latency p50/p95 (streaming mode)
Backend System
Backends Included
Typical set:
dummy(testing)hf-whisper(transformers pipeline)faster-whisper(high-performance Whisper)
Backend Contract Tests
The Backend Validator page runs checks like:
- does transcribe return required fields?
- timestamps monotonic?
- streaming capability validation (if claimed)
Streaming Interface (Optional)
Some backends can support true incremental decoding. If a backend supports it, ASR-LE can validate and use it.
Docker
Build and run:
docker build -t asrle:local .
docker run --rm -p 8501:8501 -v "$(pwd)/runs:/app/runs" asrle:local
Then open:
CI/CD
GitHub Actions included:
.github/workflows/ci.yml– tests + lint/format checks (best-effort).github/workflows/docker.yml– builds and pushes to GHCR on main/tags.github/workflows/release.yml– PyPI Trusted Publisher
Troubleshooting
1) “Heatmap is missing”
The error heatmap relies on artifacts/timeline.json.
If it doesn’t exist:
- Provide reference text
- Enable word attribution
- Enable word alignment (CTC)
- Ensure alignment deps are installed (
transformers,torchaudio) - Check CTC didn’t collapse into
<unk>outputs (alignment failure)
2) WER = 0 even with noisy audio
WER uses the reference transcript. If your reference equals the hypothesis after normalization, WER can still be 0. Verify:
- the reference text is correct and not accidentally identical
- your normalization is not overly aggressive
- backend isn’t outputting identical transcript due to VAD trimming or normalization
3) No tokens/confidence showing
Not all backends return token-level outputs.
Use faster-whisper and ensure your backend exposes tokens into:
transcript.segments[*].tokens.
4) FFmpeg errors
Install FFmpeg and ensure it is in PATH.
Contributing
Contributions are welcome:
- add a backend (must pass contract tests)
- improve alignment robustness
- add dashboards / better visualizations
- improve streaming accuracy and incremental decoding support
Recommended dev flow:
- Create a feature branch
- Add tests or a minimal reproducible case
- Ensure CI passes
- Open a PR
License
This project is licensed under MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asrle-0.2.0.tar.gz.
File metadata
- Download URL: asrle-0.2.0.tar.gz
- Upload date:
- Size: 51.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75e5ac6f9ad55c3f7be29414d1804370abb0dd1075463e85a1979e590c9b3a0e
|
|
| MD5 |
7bf37ed5881a87d97c6586e35f3a200f
|
|
| BLAKE2b-256 |
b0f166fd2f5016dfbae5392176d8869fdd0059253cee063cbea0268db376db20
|
Provenance
The following attestation bundles were made for asrle-0.2.0.tar.gz:
Publisher:
release.yml on abdulvahapmutlu/asrle
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asrle-0.2.0.tar.gz -
Subject digest:
75e5ac6f9ad55c3f7be29414d1804370abb0dd1075463e85a1979e590c9b3a0e - Sigstore transparency entry: 731028610
- Sigstore integration time:
-
Permalink:
abdulvahapmutlu/asrle@d40770947532ef1ff6d9254745bfb638f9fd69e5 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/abdulvahapmutlu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d40770947532ef1ff6d9254745bfb638f9fd69e5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file asrle-0.2.0-py3-none-any.whl.
File metadata
- Download URL: asrle-0.2.0-py3-none-any.whl
- Upload date:
- Size: 66.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0e257972a87a65cea83dda9805dea08d102c9267c3a481a098123abfbbdda49
|
|
| MD5 |
56627ad6d722393ed47e159a4ed2f7d1
|
|
| BLAKE2b-256 |
f6fd1f59af168d3845eb7752c608c0c45858db595462ccd9373e57a01527b680
|
Provenance
The following attestation bundles were made for asrle-0.2.0-py3-none-any.whl:
Publisher:
release.yml on abdulvahapmutlu/asrle
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asrle-0.2.0-py3-none-any.whl -
Subject digest:
c0e257972a87a65cea83dda9805dea08d102c9267c3a481a098123abfbbdda49 - Sigstore transparency entry: 731028613
- Sigstore integration time:
-
Permalink:
abdulvahapmutlu/asrle@d40770947532ef1ff6d9254745bfb638f9fd69e5 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/abdulvahapmutlu
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d40770947532ef1ff6d9254745bfb638f9fd69e5 -
Trigger Event:
push
-
Statement type: