A benchmark tool for Speech-to-Text models.

Project description

stt-bench

cli utility for benchmarking transcription models on Indic Datasets. Currently supported models:

ai4bharat/indic-conformer-600m-multilingual, kalpalabs/Menka, gpt-4o-transcribe, deepgram-nova-3

Currently supported datasets:

IndicVoices, Lahaja, Svarah, Fleurs

Usage:

Environment variables: Following environment variables need to be set based on the model on which inference is to be performed:

HF_TOKEN, OPENAI_API_KEY, MENKA_BASE_URL, DEEPGRAM_API_KEY, SARVAM_API_KEY, GEMINI_API_KEY

Run inference of a model on multiple datasets -

stt-bench run --model gpt-4o-transcribe

This command dumps model inference results into inference/{model}/{dataset} directory for each dataset that the inference is run on. Results are stored in csv named *predictions.csv. By default the code will run inference on all supported datasets. To run inference on only selected datasets, use it as:

stt-bench run --model gpt-4o-transcribe --eval-datasets Fleurs

Evaluate WER and CER metrics from the results directory:

stt-bench evaluate --dir inference/{model}

This will create a evaluation_metrics.csv within metrics/{model}/{dataset} that contains wer, cer metrics over all splits of the particular dataset, and the final row contains metrics over the entire dataset.

Requirements

In addition to pyproject.toml, some datasets (like Lahaja and Svarah) also need ffmpeg backend to process audios. Install ffmpeg >= 6.

Project details

Release history Release notifications | RSS feed

This version

0.2.0

Oct 29, 2025

0.1.0

Oct 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stt_bench-0.2.0.tar.gz (10.4 kB view details)

Uploaded Oct 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stt_bench-0.2.0-py3-none-any.whl (10.2 kB view details)

Uploaded Oct 29, 2025 Python 3

File details

Details for the file stt_bench-0.2.0.tar.gz.

File metadata

Download URL: stt_bench-0.2.0.tar.gz
Upload date: Oct 29, 2025
Size: 10.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.22

File hashes

Hashes for stt_bench-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`a6ca116f268aca04df2aef8fb78899850abbd00f13486881ca8e839b573d71e4`
MD5	`7a5fa4c4be41dadeb2ea831b34771689`
BLAKE2b-256	`edfbc4f590f7f18776657e825343f1605d1d64c0f5efbae51ac63ba671612dd4`

See more details on using hashes here.

File details

Details for the file stt_bench-0.2.0-py3-none-any.whl.

File metadata

Download URL: stt_bench-0.2.0-py3-none-any.whl
Upload date: Oct 29, 2025
Size: 10.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.8.22

File hashes

Hashes for stt_bench-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9a751f9c3f3f2e8f30d1ffd0757653884985e6201e563bb8aaf4a91fb4614104`
MD5	`98c5148a186bad5926d0c01c06fb4a72`
BLAKE2b-256	`3b7fea560910a455266b8596e4e323d07108a40651ace39e40e40600f9b15e65`

See more details on using hashes here.

stt-bench 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

stt-bench

Usage:

Requirements

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes