Skip to main content

Sclite-backed WER/CER evaluation engine for ASR outputs.

Project description

prama

prama 是一个基于 sclite 的语音识别评估引擎,用于在 Python 中计算 ASR 结果的 WER(Word Error Rate)和 CER(Character Error Rate),并返回可继续分析的对齐结果、汇总统计和 sclite 报告文本。

当前包的定位是轻量评估库:高层 API 面向文本列表评估,底层保留 ScliteClient 以便直接调用 sclite 对齐能力。

功能特性

  • 计算 WER 和 CER。
  • 支持多条 utterance 批量评估。
  • 返回总体统计、分组统计、逐 token 对齐结果和 PRA 文本报告。
  • 封装 libsclite.so,默认从包内 src/prama/lib/libsclite.so 查找动态库。
  • 支持通过 SCLITE_LIB_PATH 或初始化参数指定自定义 libsclite.so 路径。

环境要求

  • Python >=3.10, <3.13
  • Linux 环境
  • Poetry

安装依赖:

poetry install

运行测试:

poetry run pytest

快速开始

计算 WER:

from prama.evaluator import get_wer

result = get_wer(
    references=["the quick brown fox jumps over the lazy dog"],
    hypotheses=["the quick brown fox jumped over lazy dog"],
    utterance_ids=["sample-a"],
)

print(result.wer)
print(result.summary.substitutions)
print(result.utterances[0].tokens)

计算 CER:

from prama.evaluator import get_cer

result = get_cer(
    references=["你好世界"],
    hypotheses=["你好世"],
)

print(result.cer)
print(result.summary.deletions)

复用评估器:

from prama.evaluator import Evaluator

with Evaluator() as evaluator:
    wer_result = evaluator.get_wer(["hello world"], ["hello word"])
    cer_result = evaluator.get_cer(["hello"], ["hallo"])

print(wer_result.wer)
print(cer_result.cer)

高层 API

get_wer

get_wer(
    references: list[str],
    hypotheses: list[str],
    utterance_ids: list[str] | None = None,
) -> WerResult

get_cer

get_cer(
    references: list[str],
    hypotheses: list[str],
    utterance_ids: list[str] | None = None,
) -> WerResult

参数说明:

  • references:参考文本列表。
  • hypotheses:识别结果文本列表。
  • utterance_ids:可选的 utterance ID 列表。未传入时自动生成 utt0001utt0002 等 ID。

输入约束:

  • referenceshypothesesutterance_ids 必须是 list[str]
  • referenceshypotheses 允许长度不一致,缺失的一侧会按空文本参与评估。
  • utterance_ids 如果传入,长度必须等于实际评估对数。
  • utterance_ids 不能包含空字符串、括号或换行符。

返回结果

get_werget_cer 返回 WerResult

@dataclass(frozen=True, slots=True)
class WerResult:
    summary: ScliteCounts
    groups: list[ScliteGroup]
    utterances: list[WerUtterance]
    report: str
    metric: str

常用字段:

  • result.wer:WER 数值。
  • result.cer:CER 数值。CER 复用 sclite 的错误率字段。
  • result.accuracy:准确率。
  • result.summary:总体统计,包含 correctsubstitutionsdeletionsinsertions 等字段。
  • result.groupssclite 分组统计。
  • result.utterances:逐 utterance 的 token 对齐结果。
  • result.reportsclite 生成的 PRA 文本报告。

底层 sclite 封装

如果需要直接评估 TRN、STM、CTM 等格式文件,可以使用 ScliteClient

from prama.sclite import Format, IdType, ScliteClient, ScliteOptions

with ScliteClient() as client:
    with client.align_files(
        "ref.trn",
        "hyp.trn",
        ref_format=Format.TRN,
        hyp_format=Format.TRN,
        options=ScliteOptions(id_type=IdType.SP),
    ) as result:
        print(result.summary().wer)
        print(result.report_text())

动态库查找顺序:

  1. ScliteClient(lib_path=...) 显式传入的路径。
  2. 环境变量 SCLITE_LIB_PATH
  3. 包内 prama/lib/libsclite.so
  4. 系统库搜索结果。

开发说明

本项目使用 Poetry 管理依赖和命令:

poetry install
poetry run pytest

测试用例位于 tests/test_sclite,覆盖高层评估 API 和底层 sclite wrapper。

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prama-0.1.0a2.tar.gz (186.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prama-0.1.0a2-py3-none-any.whl (187.5 kB view details)

Uploaded Python 3

File details

Details for the file prama-0.1.0a2.tar.gz.

File metadata

  • Download URL: prama-0.1.0a2.tar.gz
  • Upload date:
  • Size: 186.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.3 Linux/6.17.0-22-generic

File hashes

Hashes for prama-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 21ab0ba13b9da948b8aeeb6a83d01924674e2f0a14ed9fc37fd7ae33a439ab32
MD5 59340d1e2755af7e965912318b2b2c9a
BLAKE2b-256 be7afea5e3381d9c3fa685f830b55a05ecb10164e9f52234ce39308a20778c5c

See more details on using hashes here.

File details

Details for the file prama-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: prama-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 187.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.4 CPython/3.12.3 Linux/6.17.0-22-generic

File hashes

Hashes for prama-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 c08f222b9cc73d55ebefb38d61e3705677a59db1251247e7eac46b6e0160a74e
MD5 95de0f6fdbf9d8e32deceb648494e367
BLAKE2b-256 53d957be55162b39d01c842c76bf91d1fe2459bdfa40f2a3a7471690b45ef3a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page