SemanticWER: Meaning-Aware ASR Evaluation Toolkit for speech-to-LLM systems

These details have not been verified by PyPI

Project links

Project description

🔥 SemanticWER

Evaluation framework for speech-to-LLM systems.

Classic Word Error Rate (WER) measures token accuracy. But modern pipelines look like this:

Speech → ASR → LLM → Task (QA, summarization, agents, RAG)

A 20% WER transcript can preserve meaning — or completely break downstream reasoning. WER cannot tell the difference.

SemanticWER fixes this with a four-component composite score:

SemanticWER = w₁·L + w₂·E + w₃·S + w₄·T

Component	What it measures
L — Lexical	Standard WER + CER (NIST-compatible)
E — Entity	Named entity preservation (PERSON, ORG, DATE, …)
S — Semantic	Embedding cosine similarity (SBERT)
T — Task	Downstream task success delta

Lower score = better transcript quality.

Installation

# Minimal (WER/CER + regex NER + Jaccard semantic fallback)
pip install semanticwer

# Recommended (full features)
pip install "semanticwer[full]"
python -m spacy download en_core_web_sm

Quick Start

from semanticwer import SemanticWER

metric = SemanticWER()  # defaults: weights=(0.3, 0.2, 0.3, 0.2)

result = metric(
    reference="The patient was prescribed 50mg of metformin twice daily",
    hypothesis="The patient was prescribed 15mg of metformin twice daily",
)

print(result.summary())
# ====================================================
#   SemanticWER Result
# ====================================================
#   Composite Score  : 0.3241  (lower = better)
# ----------------------------------------------------
#   [L] Lexical      : WER=0.1429  CER=0.0541  (w=0.30)
#   [E] Entity       : F1=0.8000  Recall=0.6667  (w=0.20)
#   [S] Semantic     : Sim=0.8923  (w=0.30)
#   [T] Task         : N/A  (w=0.20)
# ====================================================

print(result.wer)           # 0.1429
print(result.semantic_sim)  # 0.8923
print(result.entity_f1)     # 0.8000
print(result.score)         # 0.3241

torchmetrics-Style API

metric = SemanticWER(weights=(0.3, 0.2, 0.3, 0.2))

# Accumulate samples
for ref, hyp in dataset:
    metric.update(ref, hyp)

# Compute over full corpus
result = metric.aggregate()
print(f"Corpus SemanticWER: {result.score:.4f}")

HuggingFace evaluate-Style API

result = metric.compute(
    predictions=hypotheses,
    references=references,
)

Task Utility: The Game-Changer

Connect SemanticWER to your actual downstream task:

Built-in: ROUGE

from semanticwer import SemanticWER
from semanticwer.modules.task import TaskModule

metric = SemanticWER(
    weights=(0.25, 0.25, 0.25, 0.25),
    task_fn=TaskModule.rouge_adapter("rougeL"),
)
result = metric(ref, hyp)
print(result.task_score)  # 0.0–1.0

Built-in: Token F1 (SQuAD-style QA)

metric = SemanticWER(
    task_fn=TaskModule.f1_token_adapter(),
    weights=(0.25, 0.25, 0.25, 0.25),
)

Custom: Any callable

def my_qa_eval(reference: str, hypothesis: str) -> float:
    """Return 1.0 if hypothesis preserves the answer to our question."""
    ref_answer = qa_model(question="Who was mentioned?", context=reference)
    hyp_answer = qa_model(question="Who was mentioned?", context=hypothesis)
    return 1.0 if ref_answer == hyp_answer else 0.0

metric = SemanticWER(
    task_fn=my_qa_eval,
    weights=(0.2, 0.2, 0.3, 0.3),
)

Custom: LLM-as-judge

import anthropic

client = anthropic.Anthropic()

def llm_judge(reference: str, hypothesis: str) -> float:
    response = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=10,
        messages=[{
            "role": "user",
            "content": (
                f"Score semantic equivalence 0.0–1.0 (1.0 = identical meaning).\n"
                f"REF: {reference}\nHYP: {hypothesis}\n"
                f"Respond with only a float."
            ),
        }],
    )
    return float(response.content[0].text.strip())

metric = SemanticWER(
    task_fn=TaskModule.llm_judge_adapter(llm_judge),
    weights=(0.2, 0.2, 0.3, 0.3),
)

NER Backend Selection

# spaCy (default, best accuracy for English)
metric = SemanticWER(ner_backend="spacy")

# HuggingFace transformers pipeline
metric = SemanticWER(ner_backend="hf")

# Lightweight regex (no extra deps)
metric = SemanticWER(ner_backend="regex")

# Disable entity scoring
metric = SemanticWER(ner_backend="none")

CLI

# Single pair
semanticwer --ref "John Smith called at 3pm" --hyp "Tom Jones called at 9am"

# Files (one sentence per line)
semanticwer --ref ref.txt --hyp hyp.txt

# With ROUGE task scoring
semanticwer --ref ref.txt --hyp hyp.txt --task rouge

# JSON output (for pipelines)
semanticwer --ref ref.txt --hyp hyp.txt --output json

# Custom weights
semanticwer --ref ref.txt --hyp hyp.txt --weights 0.4 0.2 0.3 0.1

# CSV output
semanticwer --ref ref.txt --hyp hyp.txt --output csv

Result Object

result = metric(ref, hyp)

result.score            # Composite SemanticWER [0, 1]
result.wer              # Classic WER
result.cer              # Character Error Rate
result.entity_f1        # Entity F1 score
result.entity_recall    # Entity recall
result.semantic_sim     # Cosine similarity [0, 1]
result.task_score       # Task utility score (or None)

result.to_dict()        # Full breakdown as dict
result.to_json()        # Full breakdown as JSON string
result.summary()        # Human-readable table

Reproducibility / Custom Weights

Weights must sum to 1.0. Recommended presets:

Use case	Weights (L, E, S, T)
General ASR evaluation	`(0.3, 0.2, 0.3, 0.2)`
Medical / legal (entity-critical)	`(0.2, 0.4, 0.2, 0.2)`
LLM pipeline (task-first)	`(0.15, 0.15, 0.3, 0.4)`
Backward-compatible WER	`(1.0, 0.0, 0.0, 0.0)`

Citation

If you use SemanticWER in research, please cite:

@software{semanticwer2024,
  title     = {SemanticWER: Meaning-Aware ASR Evaluation Toolkit},
  year      = {2024},
  url       = {https://github.com/semanticwer/semanticwer},
}

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semanticwer-0.1.0.tar.gz (20.2 kB view details)

Uploaded Feb 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

semanticwer-0.1.0-py3-none-any.whl (17.5 kB view details)

Uploaded Feb 23, 2026 Python 3

File details

Details for the file semanticwer-0.1.0.tar.gz.

File metadata

Download URL: semanticwer-0.1.0.tar.gz
Upload date: Feb 23, 2026
Size: 20.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for semanticwer-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`73d51f35e4d6ad71f0e32d761b8d4bc1a4c45142484d16cf209e2a0dabe67bdc`
MD5	`f8cf1bfcef15a54ec9e17ed0feb47697`
BLAKE2b-256	`cfec7c2848809628b5003fcd9462ddd9c9a3590fd18da88cdb1bfee8b1e4cce4`

See more details on using hashes here.

File details

Details for the file semanticwer-0.1.0-py3-none-any.whl.

File metadata

Download URL: semanticwer-0.1.0-py3-none-any.whl
Upload date: Feb 23, 2026
Size: 17.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for semanticwer-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`34f1ac5d814d7af52065689aadf5013ed69074eba9568b732b2c13cb5f08a5d6`
MD5	`cea64dd5763bbb68b6f9f4923b755792`
BLAKE2b-256	`fbe4a2735c73ab862e9939baa136a9d004e28637af2cfb89675ec13fb7f07e1b`

See more details on using hashes here.

semanticwer 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

🔥 SemanticWER

Installation

Quick Start

torchmetrics-Style API

HuggingFace evaluate-Style API

Task Utility: The Game-Changer

Built-in: ROUGE

Built-in: Token F1 (SQuAD-style QA)

Custom: Any callable

Custom: LLM-as-judge

NER Backend Selection

CLI

Result Object

Reproducibility / Custom Weights

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes