Sclite-backed WER/CER evaluation engine for ASR outputs.

These details have not been verified by PyPI

Project description

prama

prama 是一个基于 sclite 的语音识别评估引擎，用于在 Python 中计算 ASR 结果的 WER（Word Error Rate）和 CER（Character Error Rate），默认返回轻量汇总统计，也可以按需返回可继续分析的对齐结果和 sclite 报告文本。

当前包的定位是轻量评估库：高层 API 面向文本列表评估，底层保留 ScliteClient 以便直接调用 sclite 对齐能力。

功能特性

计算 WER 和 CER。
支持多条 utterance 批量评估。
默认只返回总体统计，降低大批量评估时的内存占用。
可选返回分组统计、逐 token 对齐结果和 PRA 文本报告。
封装 libsclite.so，默认从包内 src/prama/lib/libsclite.so 查找动态库。
支持通过 SCLITE_LIB_PATH 或初始化参数指定自定义 libsclite.so 路径。

环境要求

Python >=3.10, <3.13
Linux 环境
Poetry

安装依赖：

poetry install

运行测试：

poetry run pytest

快速开始

计算 WER：

from prama.evaluator import get_wer

result = get_wer(
    references=["the quick brown fox jumps over the lazy dog"],
    hypotheses=["the quick brown fox jumped over lazy dog"],
    utterance_ids=["sample-a"],
)

print(result.wer)
print(result.summary.substitutions)

如果需要逐 utterance 的 token 对齐结果和 PRA 报告，可以显式开启：

from prama.evaluator import get_wer

result = get_wer(
    references=["the quick brown fox jumps over the lazy dog"],
    hypotheses=["the quick brown fox jumped over lazy dog"],
    utterance_ids=["sample-a"],
    include_details=True,
    include_report=True,
)

print(result.utterances[0].tokens)
print(result.report)

计算 CER：

from prama.evaluator import get_cer

result = get_cer(
    references=["你好世界"],
    hypotheses=["你好世"],
)

print(result.cer)
print(result.summary.deletions)

复用评估器：

from prama.evaluator import Evaluator

with Evaluator() as evaluator:
    wer_result = evaluator.get_wer(["hello world"], ["hello word"])
    cer_result = evaluator.get_cer(["hello"], ["hallo"])

print(wer_result.wer)
print(cer_result.cer)

高层 API

`get_wer`

get_wer(
    references: list[str],
    hypotheses: list[str],
    utterance_ids: list[str] | None = None,
    *,
    include_details: bool = False,
    include_report: bool = False,
    batch_size: int | None = 1000,
) -> WerResult

`get_cer`

get_cer(
    references: list[str],
    hypotheses: list[str],
    utterance_ids: list[str] | None = None,
    *,
    include_details: bool = False,
    include_report: bool = False,
    batch_size: int | None = 1000,
) -> WerResult

参数说明：

references：参考文本列表。
hypotheses：识别结果文本列表。
utterance_ids：可选的 utterance ID 列表。未传入时自动生成 utt0001、utt0002 等 ID。
include_details：是否返回每个 utterance 的分组统计和 token 对齐结果。默认 False。
include_report：是否生成 PRA 文本报告。默认 False。
batch_size：轻量汇总模式下的自动分批大小，默认 1000。传入 None 时关闭分批，按整批调用 sclite。

当 include_details=False 且 include_report=False 时，高层 API 会按 batch_size 分批评估并累加汇总计数，以避免一次性生成所有样本的 native 对齐结果。开启明细或报告后会使用整批评估，以保持返回内容完整。

输入约束：

references、hypotheses 和 utterance_ids 必须是 list[str]。
references 和 hypotheses 允许长度不一致，缺失的一侧会按空文本参与评估。
utterance_ids 如果传入，长度必须等于实际评估对数。
utterance_ids 不能包含空字符串、括号或换行符。

返回结果

get_wer 和 get_cer 返回 WerResult：

@dataclass(frozen=True, slots=True)
class WerResult:
    summary: ScliteCounts
    groups: list[ScliteGroup]
    utterances: list[WerUtterance]
    report: str
    metric: str

常用字段：

result.wer：WER 数值。
result.cer：CER 数值。CER 复用 sclite 的错误率字段。
result.accuracy：准确率。
result.summary：总体统计，包含 correct、substitutions、deletions、insertions 等字段。
result.groups：sclite 分组统计。仅在 include_details=True 时填充，否则为空列表。
result.utterances：逐 utterance 的 token 对齐结果。仅在 include_details=True 时填充，否则为空列表。
result.report：sclite 生成的 PRA 文本报告。仅在 include_report=True 时填充，否则为空字符串。

底层 sclite 封装

如果需要直接评估 TRN、STM、CTM 等格式文件，可以使用 ScliteClient：

from prama.sclite import Format, IdType, ScliteClient, ScliteOptions

with ScliteClient() as client:
    with client.align_files(
        "ref.trn",
        "hyp.trn",
        ref_format=Format.TRN,
        hyp_format=Format.TRN,
        options=ScliteOptions(id_type=IdType.SP),
    ) as result:
        print(result.summary().wer)
        print(result.report_text())

动态库查找顺序：

ScliteClient(lib_path=...) 显式传入的路径。
环境变量 SCLITE_LIB_PATH。
包内 prama/lib/libsclite.so。
系统库搜索结果。

开发说明

本项目使用 Poetry 管理依赖和命令：

poetry install
poetry run pytest

测试用例位于 tests/test_sclite，覆盖高层评估 API 和底层 sclite wrapper。

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

May 3, 2026

0.1.0a3 pre-release

May 3, 2026

0.1.0a2 pre-release

May 3, 2026

0.1.0a1 pre-release

May 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prama-0.1.1.tar.gz (188.1 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prama-0.1.1-py3-none-any.whl (188.6 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file prama-0.1.1.tar.gz.

File metadata

Download URL: prama-0.1.1.tar.gz
Upload date: May 3, 2026
Size: 188.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.12.3 Linux/6.17.0-22-generic

File hashes

Hashes for prama-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`88b18f7e604261c4af9faa1836b3261dd3f1403a4f786eaf2f8e8189af29f96b`
MD5	`aca73bd073e66c6ba62595c4a054bbca`
BLAKE2b-256	`629f31c77aa6e43023e47de1d594164fc938f332a1865affa2a5ea6521ec7430`

See more details on using hashes here.

File details

Details for the file prama-0.1.1-py3-none-any.whl.

File metadata

Download URL: prama-0.1.1-py3-none-any.whl
Upload date: May 3, 2026
Size: 188.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.12.3 Linux/6.17.0-22-generic

File hashes

Hashes for prama-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9c7524b0e0228fe48f680782adc12316897b5cf069972c7011b13164d9061bf8`
MD5	`ab45d81ad538e290cdd53bc85ee4ea9f`
BLAKE2b-256	`3be54fe6b65cde4b61bff2de9b0e95cfba26c9adf78f61de9a50f1ab33dd81c9`

See more details on using hashes here.

prama 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

prama

功能特性

环境要求

快速开始

高层 API

`get_wer`

`get_cer`

返回结果

底层 sclite 封装

开发说明

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes