Skip to main content

Python tools for text to speech (TTS), speech to text (STT), and speech to speech (STS) powered by MLX

Project description

MLX Audio Plus

Motivation

This fork removes a large amount of cruft (incompatibly licensed code and data that should not be included in the repo) from Blaizzy/mlx-audio. In addition to the models from that repo, this one includes improvements as well as the following new models ported to MLX in Python:

Improvements to the upstream repo will continue to be merged here.

This repo also serves as the basis for Swift ports of models in mlx-swift-audio.

Installation

pip install mlx-audio-plus

Usage

CLI

# CosyVoice 3: cross-lingual mode (reference audio only)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello, this is a test of text to speech." \
    --ref_audio reference.wav

# CosyVoice 3: zero-shot mode (with transcription for better quality)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello, this is a test of text to speech." \
    --ref_audio reference.wav \
    --ref_text "This is what I said in the reference audio."

# CosyVoice 3: instruct mode with style control
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "I have exciting news!" \
    --ref_audio reference.wav \
    --instruct_text "Speak with excitement and enthusiasm"

# CosyVoice 3: voice conversion
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --ref_audio target_speaker.wav \
    --source_audio source_speech.wav

# Play audio directly instead of saving
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello world" \
    --ref_audio reference.wav \
    --play

# Chatterbox: generate speech from reference audio
mlx_audio.tts.generate --model mlx-community/Chatterbox-TTS-4bit \
    --text "The quick brown fox jumped over the lazy dog." \
    --ref_audio reference.wav

Python

from mlx_audio.tts.generate import generate_audio

# CosyVoice 3: cross-lingual mode (reference audio only)
generate_audio(
    text="Hello, this is a test of text to speech.",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    file_prefix="output",  # Optional
    audio_format="wav",  # Optional
)

# CosyVoice 3: zero-shot mode (with transcription for better quality)
generate_audio(
    text="Bonjour, comment allez-vous aujourd'hui?",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    ref_text="This is what I said in the reference audio.",
)

# CosyVoice 3: instruct mode with style control
generate_audio(
    text="I have some exciting news to share with you!",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    instruct_text="Speak with excitement and enthusiasm",
)

# CosyVoice 3: voice conversion (convert source audio to target speaker)
generate_audio(
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="target_speaker.wav",  # Target voice
    source_audio="source_speech.wav",
)

# Chatterbox: generate speech from reference audio
generate_audio(
    text="The quick brown fox jumped over the lazy dog.",
    model="mlx-community/Chatterbox-TTS-4bit",
    ref_audio="reference.wav",
)

Speech to text

from mlx_audio.stt.models.funasr import Model

# Fun-ASR

# Load the model
model = Model.from_pretrained("mlx-community/Fun-ASR-Nano-2512-4bit")

# Basic transcription
result = model.generate("audio.wav")
print(result.text)

# Translation (speech to English text)
result = model.generate(
    "chinese_speech.wav",
    task="translate",
    target_language="en"
)

# Custom prompting for domain-specific content
result = model.generate(
    "medical_dictation.wav",
    initial_prompt="Medical consultation discussing cardiac symptoms."
)

# Streaming output
for chunk in model.generate("audio.wav", stream=True):
    print(chunk, end="", flush=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_audio_plus-0.1.8.tar.gz (639.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_audio_plus-0.1.8-py3-none-any.whl (768.3 kB view details)

Uploaded Python 3

File details

Details for the file mlx_audio_plus-0.1.8.tar.gz.

File metadata

  • Download URL: mlx_audio_plus-0.1.8.tar.gz
  • Upload date:
  • Size: 639.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mlx_audio_plus-0.1.8.tar.gz
Algorithm Hash digest
SHA256 713db12bc4ecc563dd23dd1a58f922581e943fc3f865908641c850c18ae86fae
MD5 cd65c7fcd61f403e7998837e91e848f8
BLAKE2b-256 ea034cbfcb0532fff46c96b06c399bdc4c2aab3260a352bb83a0b9de3fac93b2

See more details on using hashes here.

File details

Details for the file mlx_audio_plus-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: mlx_audio_plus-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 768.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mlx_audio_plus-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 2e44ad5a65d46391db59b694ad4b9e9b1a739ea79c1e6013ad8f7db5cea9472b
MD5 c89eecf5ecaf5fbacb93ad532918d63c
BLAKE2b-256 3fe5456345bec74afbaa86a7f8a7ede657238c0b1e2340aa04a5b0020a4da623

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page