Skip to main content

A benchmarking and analysis framework for Russian ASR models

Project description

🌱 plantain2asr

PyPI version Python 3.9+ License: MIT Docs

Benchmarking and analysis framework for Russian ASR models.

Pipeline API that lets you load a dataset, apply models, normalize text, compute metrics and explore results — all in a consistent >> interface.

from plantain2asr import GolosDataset, Models, SimpleNormalizer, Metrics, ReportServer

ds   = GolosDataset("data/golos")          # auto-downloads if missing
ds   >> Models.GigaAM_v3()                 # run inference
norm = ds >> SimpleNormalizer()            # normalize text
norm >> Metrics.composite()               # WER, CER, MER, WIL, WIP, Accuracy…
norm.to_pandas()                           # pandas DataFrame for further analysis
ReportServer(norm, audio_dir="data/golos").serve()  # interactive browser report

Install

# Core — dataset loading + WER/CER metrics (no GPU required)
pip install plantain2asr

# + GigaAM v2/v3 models
pip install plantain2asr[gigaam]

# + Whisper (HuggingFace)
pip install plantain2asr[whisper]

# + deep analysis tools (pandas, bert-score, POS-analysis…)
pip install plantain2asr[analysis]

# Everything
pip install plantain2asr[all]

Quick Start

Load a dataset

from plantain2asr import GolosDataset, DagrusDataset, NeMoDataset

# GOLOS test set — auto-downloads on first run (~2.5 GB)
ds = GolosDataset("data/golos")

# DaGRuS (Dagestani Russian Speech corpus)
ds = DagrusDataset("data/dagrus")

# Any NeMo-format JSONL manifest
ds = NeMoDataset("data/my_dataset")

Apply a model

from plantain2asr import Models

ds >> Models.GigaAM_v3()                          # GigaAM v3 e2e-RNNT (default)
ds >> Models.GigaAM_v3(model_name="e2e_ctc")      # GigaAM v3 e2e-CTC
ds >> Models.GigaAM_v3(model_name="rnnt")         # GigaAM v3 RNNT
ds >> Models.GigaAM_v2(model_name="v2_rnnt")      # GigaAM v2
ds >> Models.Whisper()                             # Whisper large-v3 RU
ds >> Models.Tone()                                # T-one RussianTone
ds >> Models.Vosk(model_path="models/vosk-ru")    # Vosk (offline, CPU)
ds >> Models.SaluteSpeech()                        # SaluteSpeech API

Results accumulate in sample.asr_results — run multiple models on the same dataset to compare them.

Normalize text

from plantain2asr import SimpleNormalizer, DagrusNormalizer

# General Russian normalization: lowercase, strip punctuation, ё→е
norm = ds >> SimpleNormalizer()

# DaGRuS-specific: handles annotations [laugh], fillers (ага, угу), colloquialisms
norm = ds >> DagrusNormalizer(remove_fillers=False, strip_punctuation=True)

Normalization creates a new dataset view — the original ds is untouched.

Compute metrics

from plantain2asr import Metrics

norm >> Metrics.composite()   # WER, CER, MER, WIL, WIP, Accuracy, IDR, LengthRatio

Metrics are stored per-sample in sample.asr_results[model]["metrics"].

Explore results

# Pandas DataFrame — one row per (sample, model)
df = norm.to_pandas()
df.groupby("model")[["WER", "CER", "Accuracy"]].mean().sort_values("WER")

# Word-level error breakdown
from plantain2asr import WordErrorAnalyzer
norm >> WordErrorAnalyzer(model_name="GigaAM-v3-e2e-rnnt", top_n=20)

# Interactive browser report: metrics table + error frequency + diff view
from plantain2asr import ReportServer
ReportServer(norm, audio_dir="data/golos").serve()

Load pre-computed results

Run inference on a GPU machine, transfer JSONL files, load here:

ds.load_model_results("GigaAM-v3-rnnt", "results/GigaAM-v3-rnnt_results.jsonl")

Format: {"audio_path": "/any/path/file.wav", "hypothesis": "text", "processing_time": 1.23}


Filter and slice

# Standard pipeline methods
short = ds.filter(lambda s: s.duration < 5.0)
crowd = ds.filter(lambda s: s.meta["subset"] == "crowd")
top10 = ds.take(10)

Extending

plantain2asr is built around four abstract base classes. Subclass any of them to add your own components.

Custom normalizer

from plantain2asr import BaseNormalizer

class MyNormalizer(BaseNormalizer):
    def normalize_ref(self, text: str) -> str:
        return text.lower().replace("ё", "е")

    def normalize_hyp(self, text: str) -> str:
        return text.lower().replace("ё", "е")

norm = ds >> MyNormalizer()

Custom model

from plantain2asr.models.base import BaseASRModel

class MyModel(BaseASRModel):
    @property
    def name(self) -> str:
        return "MyModel"

    def transcribe(self, audio_path: str) -> str:
        # your inference logic
        return "transcribed text"

ds >> MyModel()

Custom metric

from plantain2asr.metrics.base import BaseMetric

class SyllableErrorRate(BaseMetric):
    @property
    def name(self) -> str:
        return "SER"

    def calculate(self, reference: str, hypothesis: str) -> float:
        # your metric logic
        ref_syls = sum(1 for c in reference if c in "аеёиоуыэюя")
        hyp_syls = sum(1 for c in hypothesis if c in "аеёиоуыэюя")
        return abs(ref_syls - hyp_syls) / max(ref_syls, 1) * 100

norm >> SyllableErrorRate()

Custom report section

from plantain2asr import BaseSection

class LengthSection(BaseSection):
    @property
    def name(self) -> str:   return "length"
    @property
    def title(self) -> str:  return "Length Stats"
    @property
    def icon(self) -> str:   return "📏"

    def compute(self, dataset) -> dict:
        return {
            s.id: {"words": len(s.text.split())}
            for s in dataset
        }

    def js_function(self) -> str:
        return "function render_length() { /* your JS */ }"

from plantain2asr import ReportServer
ReportServer(norm, sections=[LengthSection()]).serve()

See full extending guide for complete examples.


Supported models

Model Extra Device
GigaAM v3 (e2e-rnnt, e2e-ctc, rnnt, ctc) gigaam CUDA / MPS / CPU
GigaAM v2 (v2-rnnt, v2-ctc) gigaam CUDA / MPS / CPU
Whisper large-v3 RU (HuggingFace) whisper CUDA / MPS / CPU
T-one RussianTone gigaam CUDA
Vosk vosk CPU
NVIDIA Canary canary CUDA
SaluteSpeech API cloud

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plantain2asr-0.1.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plantain2asr-0.1.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file plantain2asr-0.1.0.tar.gz.

File metadata

  • Download URL: plantain2asr-0.1.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for plantain2asr-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1aed64169a663fb70678c7f0db700246f440cc80e2cf9e40ba76befd22f7929e
MD5 ee4ffc3a0098827af219d2e0281d74f1
BLAKE2b-256 0b1373e9ae08f70e3738994a0c3a5746c21bee7a822234b21e1e743c005d3e54

See more details on using hashes here.

Provenance

The following attestation bundles were made for plantain2asr-0.1.0.tar.gz:

Publisher: publish.yml on akatsnelson/plantain2asr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file plantain2asr-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: plantain2asr-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for plantain2asr-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ae9e6a8995ecb9453521c939833ba7834e3a293082df7c3fc9cbf04bdbf8c935
MD5 6bee0b07aa5b66bbe65bb7304a3ec2fe
BLAKE2b-256 426b7f1bfd70184ef5c33af6a9861eb8f935ea5738fcdd42a4b0dfc1f540de3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for plantain2asr-0.1.0-py3-none-any.whl:

Publisher: publish.yml on akatsnelson/plantain2asr

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page