Skip to main content

🏆 Run benchmarks against the most common ASR tools on the market.

Project description

Rate That ASR (RTASR)

🏆 Run benchmarks against the most common ASR tools on the market.


Early Results

DER

DER evaluation

DER evaluation

WER

Work in progress...

Installation

Last stable version

pip install rtasr

From source

git clone https://github.com/Wordcab/rtasr
cd rtasr

pip install .

Commands

The CLI is available through the rtasr command.

rtasr --help

List datasets, metrics and providers

# List everything
rtasr list
# List only datasets
rtasr list -t datasets
# List only metrics
rtasr list -t metrics
# List only providers
rtasr list -t providers

Datasets download

Available datasets are:

rtasr download -d <dataset>

ASR Transcription

Providers

Implemented ASR providers are:

Run transcription

Run ASR transcription on a given dataset with a given provider.

rtasr transcription -d <dataset> -p <provider>

Multiple providers

You can specify as many providers as you want:

rtasr transcription -d <dataset> -p <provider1> <provider2> <provider3> ...

Choose dataset split

You can specify the dataset split to use:

rtasr transcription -d <dataset> -p <provider> -s <split>

If not specified, all the available splits will be used.

Caching

By default, the transcription results are cached in the ~/.cache/rtasr/transcription directory for each provider.

If you don't want to use the cache, use the --no-cache flag.

rtasr transcription -d <dataset> -p <provider> --no-cache

Note: the cache is used to avoid running the same file twice. By removing the cache, you will run the transcription on the whole dataset again. We aren't responsible for any extra costs.

Debug mode

Use the --debug flag to run only one file by split for each provider.

rtasr transcription -d <dataset> -p <provider> --debug

Evaluation

The evaluation command allows you to run an evaluation on the transcription results.

If you don't specify the split, the evaluation will be run on the whole dataset.

Run DER evaluation

rtasr evaluation -m der -d <dataset> -s <split>

Run WER evaluation

rtasr evaluation -m wer -d <dataset> -s <split>

Plot results

To get the plots of the evaluation results, use the plot command.

If you don't specify the split, the plots will be generated for all the available splits.

Plot DER results

rtasr plot -m der -d <dataset> -s <split>

Plot WER results

rtasr plot -m wer -d <dataset> -s <split>

Dataset length

To get the total length of a dataset, use the audio-length command. This command allow you to get the number of minutes of audio for each split of a dataset.

If you don't specify the split, the total length of the dataset will be returned for all the available splits.

rtasr audio-length -d <dataset> -s <split>

Contributing

Be sure to have hatch installed.

Quality

  • Run quality checks: hatch run quality:check
  • Run quality formatting: hatch run quality:format

Testing

  • Run tests: hatch run tests:run

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtasr-0.0.5.tar.gz (518.3 kB view hashes)

Uploaded Source

Built Distribution

rtasr-0.0.5-py3-none-any.whl (52.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page