Python tools for text to speech (TTS), speech to text (STT), and speech to speech (STS) powered by MLX
Project description
MLX Audio Plus
Motivation
This fork removes a large amount of cruft (incompatibly licensed code and data that should not be included in the repo) from Blaizzy/mlx-audio. In addition to the models from that repo, this one includes improvements as well as the following new models ported to MLX in Python:
- TTS
- STT
Improvements to the upstream repo will continue to be merged here.
This repo also serves as the basis for Swift ports of models in mlx-swift-audio.
Installation
pip install mlx-audio-plus
Usage
CLI
# CosyVoice 3: cross-lingual mode (reference audio only)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
--text "Hello, this is a test of text to speech." \
--ref_audio reference.wav
# CosyVoice 3: zero-shot mode (with transcription for better quality)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
--text "Hello, this is a test of text to speech." \
--ref_audio reference.wav \
--ref_text "This is what I said in the reference audio."
# CosyVoice 3: instruct mode with style control
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
--text "I have exciting news!" \
--ref_audio reference.wav \
--instruct_text "Speak with excitement and enthusiasm"
# CosyVoice 3: voice conversion
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
--ref_audio target_speaker.wav \
--source_audio source_speech.wav
# Play audio directly instead of saving
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
--text "Hello world" \
--ref_audio reference.wav \
--play
# Chatterbox: generate speech from reference audio
mlx_audio.tts.generate --model mlx-community/Chatterbox-TTS-4bit \
--text "The quick brown fox jumped over the lazy dog." \
--ref_audio reference.wav
Python
from mlx_audio.tts.generate import generate_audio
# CosyVoice 3: cross-lingual mode (reference audio only)
generate_audio(
text="Hello, this is a test of text to speech.",
model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
ref_audio="reference.wav",
file_prefix="output", # Optional
audio_format="wav", # Optional
)
# CosyVoice 3: zero-shot mode (with transcription for better quality)
generate_audio(
text="Bonjour, comment allez-vous aujourd'hui?",
model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
ref_audio="reference.wav",
ref_text="This is what I said in the reference audio.",
)
# CosyVoice 3: instruct mode with style control
generate_audio(
text="I have some exciting news to share with you!",
model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
ref_audio="reference.wav",
instruct_text="Speak with excitement and enthusiasm",
)
# CosyVoice 3: voice conversion (convert source audio to target speaker)
generate_audio(
model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
ref_audio="target_speaker.wav", # Target voice
source_audio="source_speech.wav",
)
# Chatterbox: generate speech from reference audio
generate_audio(
text="The quick brown fox jumped over the lazy dog.",
model="mlx-community/Chatterbox-TTS-4bit",
ref_audio="reference.wav",
)
Speech to text
from mlx_audio.stt.models.funasr import Model
# Fun-ASR
# Load the model
model = Model.from_pretrained("mlx-community/Fun-ASR-Nano-2512-4bit")
# Basic transcription
result = model.generate("audio.wav")
print(result.text)
# Translation (speech to English text)
result = model.generate(
"chinese_speech.wav",
task="translate",
target_language="en"
)
# Custom prompting for domain-specific content
result = model.generate(
"medical_dictation.wav",
initial_prompt="Medical consultation discussing cardiac symptoms."
)
# Streaming output
for chunk in model.generate("audio.wav", stream=True):
print(chunk, end="", flush=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_audio_plus-0.1.8.tar.gz.
File metadata
- Download URL: mlx_audio_plus-0.1.8.tar.gz
- Upload date:
- Size: 639.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
713db12bc4ecc563dd23dd1a58f922581e943fc3f865908641c850c18ae86fae
|
|
| MD5 |
cd65c7fcd61f403e7998837e91e848f8
|
|
| BLAKE2b-256 |
ea034cbfcb0532fff46c96b06c399bdc4c2aab3260a352bb83a0b9de3fac93b2
|
File details
Details for the file mlx_audio_plus-0.1.8-py3-none-any.whl.
File metadata
- Download URL: mlx_audio_plus-0.1.8-py3-none-any.whl
- Upload date:
- Size: 768.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e44ad5a65d46391db59b694ad4b9e9b1a739ea79c1e6013ad8f7db5cea9472b
|
|
| MD5 |
c89eecf5ecaf5fbacb93ad532918d63c
|
|
| BLAKE2b-256 |
3fe5456345bec74afbaa86a7f8a7ede657238c0b1e2340aa04a5b0020a4da623
|