Skip to main content

A simplified python packge to interact with the reverb models

Project description

Rev Logo Rev Logo

Reverb

Open source inference and evaluation code for Rev's state-of-the-art speech recognition and diarization models. The speech recognition (ASR) code uses the WeNet framework and the speech diarization code uses the Pyannote framework. More detailed model descriptions can be found in our blog and the models can be downloaded from huggingface.

Table of Contents

ASR

Speech-to-text code based on the WeNet framework. See the ASR folder for more details and usage instructions.

Long-form speech recognition WER results:

Model Earnings21 Earnings22 Rev16
Reverb ASR 9.68 13.68 10.30
Whisper Large-v3 14.26 19.05 10.86
Canary-1B 14.40 19.01 13.82

Diarization

Speaker diarization code based on the Pyannote framework. See the diarization folder for more details and usage instructions.

Long-form WDER results, in combination with Rev's ASR:

Model Earnings21 Rev16
Pyannote3.0 0.051 0.090
Reverb Diarization V1 0.047 0.077
Reverb Diarization V2 0.046 0.078

Getting Started

[!IMPORTANT] These instructions require that you set up:

  • HuggingFace access token and have cli login.
  • Git LFS
    • Simply run git lfs install from your terminal.

Check out the READMEs within each subdirectory for more information on the ASR or diarization models.

Python Setup

This codebase is compatible Python 3.10+. To get started, simply run

pip install .

This will install the reverb package into your python environment which is a modified version of the wenet python package. In order to use reverb's code, make sure you do not have another wenet installation in your environment which might cause conflict issues.

[!TIP] While we suggest using our CLI or Python package to download the reverb model, you can also download it manually by running:

git lfs install
git clone https://huggingface.co/Revai/reverb-asr

Command Line Usage

The following command can be used to transcribe audio files:

reverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results

You can also specify how "verbatim" the transcription should be:

reverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results --verbatimicity 0.2

Even change the decoding mode:

reverb --model reverb_asr_v1 --audio_file audio.mp3 --result_dir results --modes ctc_prefix_beam_search

For a full list of arguments, run:

reverb --help

or checkout our script.

Python Usage

Reverb can also be used from within Python:

import wenet
reverb = wenet.load_model("reverb_asr_v1")
output = reverb.transcribe("audio.mp3")
print(output)

The load_model function will automatically download the reverb model from HuggingFace. If instead you have a local version of the model that you downloaded from our HuggingFace or that you've finetuned, you can simply specify the path to the directory containing the .pt checkpoint, config.yaml, and extra files in load_model to use your model.

import wenet
reverb = wenet.load_model("/local/reverb-asr")
output = reverb.transcribe("audio.mp3")
print(output)

If instead of text output, you'd prefer CTM output, simply specify the format in the transcribe command.

import wenet
reverb = wenet.load_model("reverb_asr_v1")
# Specifying the "format" will change the output
output = reverb.transcribe("audio.mp3", format="ctm")
print(output)

All arguments available to the reverb command line are also parameters that can be included in the transcribe command.

import wenet
reverb = wenet.load_model("reverb_asr_v1")
# Specifying the "format" will change the output
output = reverb.transcribe("audio.mp3", verbatimicity=0.5, beam_size=2, ctc_weight=0.6)
print(output)

Docker Image

Alternatively, you can use Docker to run ASR and/or diarization without needing to install dependencies (including the model files). directly on your system. First, make sure Docker is installed on your system. If you wish to run on NVIDIA GPU, more steps might be required. Then, run the following command to build the Docker image:

docker build -t reverb . --build-arg HUGGINGFACE_ACCESS_TOKEN=${YOUR_HUGGINGFACE_ACCESS_TOKEN}

And to run docker

sudo docker run --entrypoint "/bin/bash" --gpus all --rm -it reverb

Hosting the Model

If your usecase requires a to deploy these models at a larger scale and maintaining strict security requirements, consider using our other release: https://github.com/revdotcom/reverb-self-hosted. This setup will give you full control over the deployment of our models on your own infrastructure without the need for internet connectivity or cloud dependencies.

License

The license in this repository applies only to the code not the models. See LICENSE for details. For model licenses, check out their pages on HuggingFace.

Citations

If you make use of this model, please cite this paper

@article{bhandari2024reverb,
  title={Reverb: Open-Source ASR and Diarization from Rev},
  author={Bhandari, Nishchal and Chen, Danny and del Río Fernández, Miguel Ángel and Delworth, Natalie and Fox, Jennifer Drexler and Jetté, Miguel and McNamara, Quinten and Miller, Corey and Novotný, Ondřej and Profant, Ján and Qin, Nan and Ratajczak, Martin and Robichaud, Jean-Philippe},
  journal={arXiv preprint arXiv:2410.03930},
  year={2024}
}

Contributors

Nishchal Bhandari, Danny Chen, Miguel Del Rio, Natalie Delworth, Jennifer Drexler Fox, Miguel Jette, Quinn McNamara, Corey Miller, Ondrej Novotny, Jan Profant, Nan Qin, Martin Ratajczak, and Jean-Philippe Robichaud.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rev_reverb-0.1.0.tar.gz (308.3 kB view details)

Uploaded Source

Built Distribution

rev_reverb-0.1.0-py3-none-any.whl (398.1 kB view details)

Uploaded Python 3

File details

Details for the file rev_reverb-0.1.0.tar.gz.

File metadata

  • Download URL: rev_reverb-0.1.0.tar.gz
  • Upload date:
  • Size: 308.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/1.0.0 urllib3/1.26.4 tqdm/4.62.3 importlib-metadata/4.0.0 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.5

File hashes

Hashes for rev_reverb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 90bc4008b67adbe5178bf208bb2b5be2c7ebefc5130ded02044c029917b08785
MD5 3c3cdd74e5dfcc2ad0e3408e59616970
BLAKE2b-256 d45df1456faeb2f1e88df3c54e7726e709a0214e6b95cefc7fd1c8ace85495c4

See more details on using hashes here.

File details

Details for the file rev_reverb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rev_reverb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 398.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.10.0 readme-renderer/34.0 requests/2.25.1 requests-toolbelt/1.0.0 urllib3/1.26.4 tqdm/4.62.3 importlib-metadata/4.0.0 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.5

File hashes

Hashes for rev_reverb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 16f0cf7ab35e87d9c8e2618e6279bdd4e598f64d1f1b1d89151a2a25d8d90e92
MD5 24c2d190dac31d8589feafaacee0d76f
BLAKE2b-256 f6c44b9f7dd259c1d25956657bc0831ec0ccdf6de6bf51070b0737f9750aca21

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page