Skip to main content

Speech Recognition for Live Transcription and Voice Commands

Project description

Moonshine

[Blog] [Paper] [Model Card] [Podcast]

Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. It is well-suited to real-time, on-device applications like live transcription and voice command recognition. Moonshine obtains word-error rates (WER) better than similarly-sized Whisper models from OpenAI on the datasets used in the OpenASR leaderboard maintained by HuggingFace:

TinyBase
WER Moonshine Whisper
Average 12.66 12.81
AMI 22.77 24.24
Earnings22 21.25 19.12
Gigaspeech 14.41 14.08
LS Clean 4.52 5.66
LS Other 11.71 15.45
SPGISpeech 7.70 5.93
Tedlium 5.64 5.97
Voxpopuli 13.27 12.00
WER Moonshine Whisper
Average 10.07 10.32
AMI 17.79 21.13
Earnings22 17.65 15.09
Gigaspeech 12.19 12.83
LS Clean 3.23 4.25
LS Other 8.18 10.35
SPGISpeech 5.46 4.26
Tedlium 5.22 4.87
Voxpopuli 10.81 9.76

Moonshine's compute requirements scale with the length of input audio. This means that shorter input audio is processed faster, unlike existing Whisper models that process everything as 30-second chunks. To give you an idea of the benefits: Moonshine processes 10-second audio segments 5x faster than Whisper while maintaining the same (or better!) WER.

This repo hosts the inference code for Moonshine.

Installation

We like uv for managing Python environments, so we use it here. If you don't want to use it, simply skip the first step and leave uv off of your shell commands.

1. Create a virtual environment

First, install uv for Python environment management.

Then create and activate a virtual environment:

uv venv env_moonshine
source env_moonshine/bin/activate

2. Install the Moonshine package

The moonshine inference code is written in Keras and can run with each of the backends that Keras supports: Torch, TensorFlow, and JAX. The backend you choose will determine which flavor of the moonshine package to install. If you're just getting started, we suggest installing the (default) Torch backend:

uv pip install useful-moonshine@git+https://github.com/usefulsensors/moonshine.git

To run the provided inference code, you have to instruct Keras to use the PyTorch backend by setting an environment variable:

export KERAS_BACKEND=torch

To run with the TensorFlow backend, run the following to install Moonshine and set the environment variable:

uv pip install useful-moonshine[tensorflow]@git+https://github.com/usefulsensors/moonshine.git
export KERAS_BACKEND=tensorflow

To run with the JAX backend, run the following:

uv pip install useful-moonshine[jax]@git+https://github.com/usefulsensors/moonshine.git
export KERAS_BACKEND=jax
# Use useful-moonshine[jax-cuda] for jax on GPU

3. Try it out

You can test Moonshine by transcribing the provided example audio file with the .transcribe function:

python
>>> import moonshine
>>> moonshine.transcribe(moonshine.ASSETS_DIR / 'beckett.wav', 'moonshine/tiny')
['Ever tried ever failed, no matter try again, fail again, fail better.']

The first argument is a path to an audio file and the second is the name of a Moonshine model. moonshine/tiny and moonshine/base are the currently available models.

TODO

  • Live transcription demo

  • ONNX model

  • CTranslate2 support

  • MLX support

Citation

If you benefit from our work, please cite us:

@misc{jeffries2024moonshinespeechrecognitionlive,
      title={Moonshine: Speech Recognition for Live Transcription and Voice Commands}, 
      author={Nat Jeffries and Evan King and Manjunath Kudlur and Guy Nicholson and James Wang and Pete Warden},
      year={2024},
      eprint={2410.15608},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2410.15608}, 
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

useful_moonshine-20241016.tar.gz (728.2 kB view details)

Uploaded Source

Built Distribution

useful_moonshine-20241016-py3-none-any.whl (733.9 kB view details)

Uploaded Python 3

File details

Details for the file useful_moonshine-20241016.tar.gz.

File metadata

File hashes

Hashes for useful_moonshine-20241016.tar.gz
Algorithm Hash digest
SHA256 d7c69669d3441407d821c05db6d4b47177dd0df4e85d54e65b2a9483bfec5183
MD5 a8d44beba18e1ca842678b0f9fcea4d3
BLAKE2b-256 ac752202bbae4d2f65c69f1767c4e99d4f9a05f3a9cb74e30c341232981d09ca

See more details on using hashes here.

File details

Details for the file useful_moonshine-20241016-py3-none-any.whl.

File metadata

File hashes

Hashes for useful_moonshine-20241016-py3-none-any.whl
Algorithm Hash digest
SHA256 b4803256411cd9126692479c1cf1c31fa16a194f5a9a9ecc453325849a27ad47
MD5 4e6eafa6f47a1530d1c605407a71c172
BLAKE2b-256 e5a74dfd1a68629e9567b5c46ab9621fc0dbf1b4e1a56cf296e91aa7283c3635

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page