Skip to main content

A study to benchmark whisper based ASRs in Malayalam

Project description

malayalam_asr_benchmarking

Objective of the project

Note

A study to benchmark ASRs in Malayalam. Till now the project has benchmark based on Malayalam ASR models based in Whisper.

Benchmarked Datasets

Till now we have mainly benchmarked on two datasets:

  1. Common Voice 11 Dataset

I have now done benchmarking on Mozilla’s Common Voice 11 Malayalam subset. The benchmarking results can be found in the below dataset.

  1. Malayalam Speech Corpus

I have now benchmarked on SMC’s Malayalam Speech corpus dataset. The benchmarking results can be found in the below dataset.

Install

pip install malayalam_asr_benchmarking

Or locally

pip install -e .

Setting up your development environment

I am developing this project with nbdev. Please take some time reading up on nbdev … how it works, directives, etc… by checking out the walk-thrus and tutorials on the nbdev website

Step 1: Install Quarto:

nbdev_install_quarto

Other options are mentioned in getting started to quarto

Step 2: Install hooks

nbdev_install_hooks

Step 3: Install our library

pip install -e '.[dev]'

How to use

from malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice

werlist = []
cerlist = []
modelsizelist = []
timelist = []

evaluate_whisper_model_common_voice("parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist)
Downloading (…)lve/main/config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.03k/1.03k [00:00<00:00, 6.09MB/s]
Downloading pytorch_model.bin: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 151M/151M [00:24<00:00, 6.07MB/s]
Downloading (…)okenizer_config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 827/827 [00:00<00:00, 2.64MB/s]
Downloading (…)olve/main/vocab.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.04M/1.04M [00:00<00:00, 1.14MB/s]
Downloading (…)olve/main/merges.txt: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 494k/494k [00:00<00:00, 2.65MB/s]
Downloading (…)main/normalizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 52.7k/52.7k [00:00<00:00, 252kB/s]
Downloading (…)in/added_tokens.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.11k/2.11k [00:00<00:00, 8.53MB/s]
Downloading (…)cial_tokens_map.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.06k/2.06k [00:00<00:00, 5.10MB/s]
Downloading (…)rocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 185k/185k [00:02<00:00, 76.2kB/s]

AssertionError: Torch not compiled with CUDA enabled

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

File details

Details for the file malayalam_asr_benchmarking-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for malayalam_asr_benchmarking-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 fb01afaefc1706cf0d5d3f68ebd886ad43b1feed713b718268c0fad3fc710b56
MD5 1f8e6628b95d5c04f9f1bb9aa6f47fef
BLAKE2b-256 19a537fda0d843024a3021e81a3f687e148ca32298db57d0bf2cbb8f30e5ddcc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page