Python tools for text to speech (TTS), speech to text (STT), and speech to speech (STS) powered by MLX

These details have not been verified by PyPI

Project links

Project description

MLX Audio Plus

Motivation

This fork removes a large amount of cruft (incompatibly licensed code and data that should not be included in the repo) from Blaizzy/mlx-audio. In addition to the models from that repo, this one includes improvements as well as the following new models ported to MLX in Python:

TTS
STT
- Fun-ASR

Improvements to the upstream repo will continue to be merged here.

This repo also serves as the basis for Swift ports of models in mlx-swift-audio.

Installation

pip install mlx-audio-plus

Usage

CLI

# CosyVoice 3: cross-lingual mode (reference audio only)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello, this is a test of text to speech." \
    --ref_audio reference.wav

# CosyVoice 3: zero-shot mode (with transcription for better quality)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello, this is a test of text to speech." \
    --ref_audio reference.wav \
    --ref_text "This is what I said in the reference audio."

# CosyVoice 3: instruct mode with style control
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "I have exciting news!" \
    --ref_audio reference.wav \
    --instruct_text "Speak with excitement and enthusiasm"

# CosyVoice 3: voice conversion
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --ref_audio target_speaker.wav \
    --source_audio source_speech.wav

# Play audio directly instead of saving
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello world" \
    --ref_audio reference.wav \
    --play

# Chatterbox: generate speech from reference audio
mlx_audio.tts.generate --model mlx-community/Chatterbox-TTS-4bit \
    --text "The quick brown fox jumped over the lazy dog." \
    --ref_audio reference.wav

Python

from mlx_audio.tts.generate import generate_audio

# CosyVoice 3: cross-lingual mode (reference audio only)
generate_audio(
    text="Hello, this is a test of text to speech.",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    file_prefix="output",  # Optional
    audio_format="wav",  # Optional
)

# CosyVoice 3: zero-shot mode (with transcription for better quality)
generate_audio(
    text="Bonjour, comment allez-vous aujourd'hui?",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    ref_text="This is what I said in the reference audio.",
)

# CosyVoice 3: instruct mode with style control
generate_audio(
    text="I have some exciting news to share with you!",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    instruct_text="Speak with excitement and enthusiasm",
)

# CosyVoice 3: voice conversion (convert source audio to target speaker)
generate_audio(
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="target_speaker.wav",  # Target voice
    source_audio="source_speech.wav",
)

# Chatterbox: generate speech from reference audio
generate_audio(
    text="The quick brown fox jumped over the lazy dog.",
    model="mlx-community/Chatterbox-TTS-4bit",
    ref_audio="reference.wav",
)

Speech to text

from mlx_audio.stt.models.funasr import Model

# Fun-ASR

# Load the model
model = Model.from_pretrained("mlx-community/Fun-ASR-Nano-2512-4bit")

# Basic transcription
result = model.generate("audio.wav")
print(result.text)

# Translation (speech to English text)
result = model.generate(
    "chinese_speech.wav",
    task="translate",
    target_language="en"
)

# Custom prompting for domain-specific content
result = model.generate(
    "medical_dictation.wav",
    initial_prompt="Medical consultation discussing cardiac symptoms."
)

# Streaming output
for chunk in model.generate("audio.wav", stream=True):
    print(chunk, end="", flush=True)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.8

Jan 8, 2026

0.1.7

Dec 22, 2025

0.1.6

Dec 17, 2025

0.1.5

Dec 16, 2025

0.1.4

Dec 14, 2025

0.1.3

Dec 11, 2025

0.1.2

Dec 9, 2025

0.1.1

Dec 5, 2025

0.1.0

Dec 4, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_audio_plus-0.1.8.tar.gz (639.3 kB view details)

Uploaded Jan 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_audio_plus-0.1.8-py3-none-any.whl (768.3 kB view details)

Uploaded Jan 8, 2026 Python 3

File details

Details for the file mlx_audio_plus-0.1.8.tar.gz.

File metadata

Download URL: mlx_audio_plus-0.1.8.tar.gz
Upload date: Jan 8, 2026
Size: 639.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mlx_audio_plus-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`713db12bc4ecc563dd23dd1a58f922581e943fc3f865908641c850c18ae86fae`
MD5	`cd65c7fcd61f403e7998837e91e848f8`
BLAKE2b-256	`ea034cbfcb0532fff46c96b06c399bdc4c2aab3260a352bb83a0b9de3fac93b2`

See more details on using hashes here.

File details

Details for the file mlx_audio_plus-0.1.8-py3-none-any.whl.

File metadata

Download URL: mlx_audio_plus-0.1.8-py3-none-any.whl
Upload date: Jan 8, 2026
Size: 768.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for mlx_audio_plus-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e44ad5a65d46391db59b694ad4b9e9b1a739ea79c1e6013ad8f7db5cea9472b`
MD5	`c89eecf5ecaf5fbacb93ad532918d63c`
BLAKE2b-256	`3fe5456345bec74afbaa86a7f8a7ede657238c0b1e2340aa04a5b0020a4da623`

See more details on using hashes here.

mlx-audio-plus 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MLX Audio Plus

Motivation

Installation

Usage

CLI

Python

Speech to text

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes