Skip to main content

Fast, accurate, on-device AI library for building interactive voice applications

Project description

Moonshine Voice Python Package

A fast, accurate, on-device AI library for building interactive voice applications. Join our Discord to get help and support.

Installation

pip install moonshine-voice

Quick Start

# Listens to the microphone, logging to the console when there are 
# speech updates.
python -m moonshine_voice.mic_transcriber

Example

"""Transcribes live audio from the default microphone"""
import time
from moonshine_voice import (
    MicTranscriber,
    TranscriptEventListener,
    get_model_for_language,
)

# This will download the model files and cache them.
model_path, model_arch = get_model_for_language("en")

# MicTranscriber handles connecting to the microphone, capturing
# the audio data, detecting voice activity, breaking the speech
# up into segments, transcribing the speech, and sending events
# as the results are updated over time.
mic_transcriber = MicTranscriber(
    model_path=model_path, model_arch=model_arch)

# We use an event-driven interface to respond in real time
# as speech is detected.
class TestListener(TranscriptEventListener):
    def on_line_started(self, event):
        print(f"Line started: {event.line.text}")

    def on_line_text_changed(self, event):
        print(f"Line text changed: {event.line.text}")

    def on_line_completed(self, event):
        print(f"Line completed: {event.line.text}")

listener = TestListener()
mic_transcriber.add_listener(listener)
mic_transcriber.start()
print("Listening to the microphone, press Ctrl+C to stop...")

while True:
    time.sleep(0.1)

Other Sources

If you have a different source you're capturing audio from you can supply it directly to a transcriber.

"""Transcribes live audio from an arbitrary audio source."""
from moonshine_voice import (
    Transcriber,
    TranscriptEventListener,
    get_model_for_language,
    load_wav_file,
    get_assets_path,
)
import os
from typing import Iterator, Tuple


def audio_chunk_generator(
    wav_file_path: str, chunk_duration: float = 0.1
) -> Iterator[Tuple[list, int]]:
    """
    Example function that loads a WAV file and yields audio chunks.

    This demonstrates how you can integrate your own proprietary
    audio data capture sources. Replace this function with your own
    implementation that yields (audio_chunk, sample_rate) tuples.

    Args:
        wav_file_path: Path to the WAV file to load
        chunk_duration: Duration of each chunk in seconds

    Yields:
        Tuple of (audio_chunk, sample_rate) where:
        - audio_chunk: List of float audio samples
        - sample_rate: Sample rate in Hz
    """
    audio_data, sample_rate = load_wav_file(wav_file_path)
    chunk_size = int(chunk_duration * sample_rate)

    for i in range(0, len(audio_data), chunk_size):
        chunk = audio_data[i: i + chunk_size]
        yield (chunk, sample_rate)


model_path, model_arch = get_model_for_language("en")

transcriber = Transcriber(
    model_path=model_path, model_arch=model_arch)

stream = transcriber.create_stream(update_interval=0.5)
stream.start()


class TestListener(TranscriptEventListener):
    def on_line_started(self, event):
        print(f"{event.line.start_time:.2f}s: Line started: {event.line.text}")

    def on_line_text_changed(self, event):
        print(
            f"{event.line.start_time:.2f}s: Line text changed: {event.line.text}")

    def on_line_completed(self, event):
        print(f"{event.line.start_time:.2f}s: Line completed: {event.line.text}")


listener = TestListener()
stream.add_listener(listener)

# Feed audio chunks from the generator into the stream.
wav_file_path = os.path.join(get_assets_path(), "two_cities.wav")
for chunk, sample_rate in audio_chunk_generator(wav_file_path):
    stream.add_audio(chunk, sample_rate)

stream.stop()
stream.close()

Voice Commands

We also provide voice command recognition using the IntentRecognizer module. It captures transcribed audio from a MicTranscriber and invokes callback functions that match your programmed intents. Since it relies on an embedding model, you can use a helper function to get started:

from moonshine_voice import (
    MicTranscriber,
    IntentRecognizer,
    ModelArch,
    EmbeddingModelArch,
    get_embedding_model,
    get_model_for_language
)

# Download and load the embedding model for intent recognition
embedding_model_path, embedding_model_arch = get_embedding_model()

Next, create a recognizer and register your intent callbacks:

intent_recognizer = IntentRecognizer(
    model_path=embedding_model_path,
    model_arch=embedding_model_arch
)

def on_lights_on(trigger: str, utterance: str, similarity: float):
    """Handler for turning lights on."""
    print(f"\n💡 LIGHTS ON! (matched '{trigger}' with {similarity:.0%} confidence)")

def on_lights_off(trigger: str, utterance: str, similarity: float):
    """Handler for turning lights off."""
    print(f"\n🌑 LIGHTS OFF! (matched '{trigger}' with {similarity:.0%} confidence)")

intent_recognizer.register_intent("turn on the lights", on_lights_on)
intent_recognizer.register_intent("turn off the lights", on_lights_off)

Finally, create a MicTranscriber, connect it to your IntentRecognizer, and start the audio stream:

# Get the transcription model and initialize a MicTranscriber
model_path, model_arch = get_model_for_language("en")
mic_transcriber = MicTranscriber(model_path=model_path, model_arch=model_arch)

# The intent recognizer will process completed transcript lines and invoke trigger handlers
mic_transcriber.add_listener(intent_recognizer)

mic_transcriber.start()
try:
    while True:
        time.sleep(0.1)
except KeyboardInterrupt:
    print("\n\nStopping...", file=sys.stderr)
finally:
    intent_recognizer.close()
    mic_transcriber.stop()
    mic_transcriber.close()

Multiple Languages

The framework currently supports English, Spanish, Mandarin, Japanese, Korean, Vietnamese, Arabic, and Ukrainian. We are working on wider language support, and you can see which are supported in your version by calling supported_languages(). To use a language, request it using get_model_for_language() passing in the two-letter language code. For example get_model_for_language("es") will download the Spanish models and pass the information you need to create Transcriber objects using them.

Documentation

For more information, see the main Moonshine Voice documentation.

License

The code and English-language models are released under the MIT License - see the main project repository for details. The models used for other languages are released under the Moonshine Community License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

moonshine_voice-0.0.56-py3-none-win_amd64.whl (56.5 MB view details)

Uploaded Python 3Windows x86-64

moonshine_voice-0.0.56-py3-none-manylinux_2_34_x86_64.whl (88.4 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ x86-64

moonshine_voice-0.0.56-py3-none-manylinux_2_34_aarch64.whl (87.2 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ ARM64

moonshine_voice-0.0.56-py3-none-manylinux_2_31_aarch64.manylinux_2_34_aarch64.whl (59.6 MB view details)

Uploaded Python 3manylinux: glibc 2.31+ ARM64manylinux: glibc 2.34+ ARM64

moonshine_voice-0.0.56-py3-none-macosx_15_0_universal2.whl (80.1 MB view details)

Uploaded Python 3macOS 15.0+ universal2 (ARM64, x86-64)

File details

Details for the file moonshine_voice-0.0.56-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for moonshine_voice-0.0.56-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 8b3f799a0f73637da68c4a410fa0d74a07533a0b5f78ccf7fada2d6ec4a42251
MD5 81699bd10df674ac593aebca3465d96d
BLAKE2b-256 92cde0126734a87f310d0a3f4b225b3e9eadbb96f95abe0da5c1b967c7bc5155

See more details on using hashes here.

File details

Details for the file moonshine_voice-0.0.56-py3-none-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for moonshine_voice-0.0.56-py3-none-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4e208542d0e50432f7720ce39ecb19636b12b6bce75e2e9bcf33109c66377626
MD5 8a811d4ded36fcd0a7b138a697e08c90
BLAKE2b-256 d388140c3a4b6f27bcdcd223814acc87640cb0990edffbe0e198449fbce41ac8

See more details on using hashes here.

File details

Details for the file moonshine_voice-0.0.56-py3-none-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for moonshine_voice-0.0.56-py3-none-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 1fb7fea9058f8ca5f3809810d95b8ac6df463fe90f56532c4dabce3ef94a1ae8
MD5 a267338d4a04de8c95496078eb0a7b10
BLAKE2b-256 9d3e4620b8e387ed26b642aa545f971945e804b2047784106a6d2e17e6ce5263

See more details on using hashes here.

File details

Details for the file moonshine_voice-0.0.56-py3-none-manylinux_2_31_aarch64.manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for moonshine_voice-0.0.56-py3-none-manylinux_2_31_aarch64.manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 16230e1af4f898d5229771be1517158118d1b5b5c27f568ebd2c079d4ebc912b
MD5 1d86bb9d4159b80e9578132d443a2533
BLAKE2b-256 82e9fe60bc36d88cc907aac43def4716dc4ebc12a51466c399f0b542d13c9712

See more details on using hashes here.

File details

Details for the file moonshine_voice-0.0.56-py3-none-macosx_15_0_universal2.whl.

File metadata

File hashes

Hashes for moonshine_voice-0.0.56-py3-none-macosx_15_0_universal2.whl
Algorithm Hash digest
SHA256 9d14fe9eef5bc54f325b90cad15d9bd0e710a2254e83002fe9439e67af638aaa
MD5 2f2d66e85da511b101cafa8d0d9ef007
BLAKE2b-256 4105f4978be2d9f9a7faa9da4b3e1ad62416660c4c0053c2c857e7db8fd8456e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page