MLX-Audio is a package for inference of text-to-speech (TTS) and speech-to-speech (STS) models locally on your Mac using MLX

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

MLX-Audio

A text-to-speech (TTS) and Speech-to-Speech (STS) library built on Apple's MLX framework, providing efficient speech synthesis on Apple Silicon.

Features

Fast inference on Apple Silicon (M series chips)
Multiple language support
Voice customization options
Adjustable speech speed control (0.5x to 2.0x)
Interactive web interface with 3D audio visualization
REST API for TTS generation
Quantization support for optimized performance
Direct access to output files via Finder/Explorer integration

Installation

# Install the package
pip install mlx-audio

# For web interface and API dependencies
pip install -r requirements.txt

Quick Start

To generate audio with an LLM use:

# Basic usage
mlx_audio.tts.generate --text "Hello, world"

# Specify prefix for output file
mlx_audio.tts.generate --text "Hello, world" --file_prefix hello

# Adjust speaking speed (0.5-2.0)
mlx_audio.tts.generate --text "Hello, world" --speed 1.4

How to call from python

To generate audio with an LLM use:

from mlx_audio.tts.generate import generate_audio

# Example: Generate an audiobook chapter as mp3 audio
generate_audio(
    text=("In the beginning, the universe was created...\n"
        "...or the simulation was booted up."),
    model_path="prince-canuma/Kokoro-82M",
    voice="af_heart",
    speed=1.2,
    lang_code="a", # Kokoro: (a)f_heart, or comment out for auto
    file_prefix="audiobook_chapter1",
    audio_format="wav",
    sample_rate=24000,
    join_audio=True,
    verbose=True  # Set to False to disable print messages
)

print("Audiobook chapter successfully generated!")

Web Interface & API Server

MLX-Audio includes a web interface with a 3D visualization that reacts to audio frequencies. The interface allows you to:

Generate TTS with different voices and speed settings
Upload and play your own audio files
Visualize audio with an interactive 3D orb
Automatically saves generated audio files to the outputs directory in the current working folder
Open the output folder directly from the interface (when running locally)

Features

Multiple Voice Options: Choose from different voice styles (AF Heart, AF Nova, AF Bella, BF Emma)
Adjustable Speech Speed: Control the speed of speech generation with an interactive slider (0.5x to 2.0x)
Real-time 3D Visualization: A responsive 3D orb that reacts to audio frequencies
Audio Upload: Play and visualize your own audio files
Auto-play Option: Automatically play generated audio
Output Folder Access: Convenient button to open the output folder in your system's file explorer

To start the web interface and API server:

# Using the command-line interface
mlx_audio.server

# With custom host and port
mlx_audio.server --host 0.0.0.0 --port 9000

# With verbose logging
mlx_audio.server --verbose

Available command line arguments:

--host: Host address to bind the server to (default: 127.0.0.1)
--port: Port to bind the server to (default: 8000)

Then open your browser and navigate to:

http://127.0.0.1:8000

API Endpoints

The server provides the following REST API endpoints:

POST /tts: Generate TTS audio
- Parameters (form data):
  - text: The text to convert to speech (required)
  - voice: Voice to use (default: "af_heart")
  - speed: Speech speed from 0.5 to 2.0 (default: 1.0)
- Returns: JSON with filename of generated audio
GET /audio/{filename}: Retrieve generated audio file
POST /play: Play audio directly from the server
- Parameters (form data):
  - filename: The filename of the audio to play (required)
- Returns: JSON with status and filename
POST /stop: Stop any currently playing audio
- Returns: JSON with status
POST /open_output_folder: Open the output folder in the system's file explorer
- Returns: JSON with status and path
- Note: This feature only works when running the server locally

Note: Generated audio files are stored in ~/.mlx_audio/outputs by default, or in a fallback directory if that location is not writable.

Models

Kokoro

Kokoro is a multilingual TTS model that supports various languages and voice styles.

Example Usage

from mlx_audio.tts.models.kokoro import KokoroPipeline
from mlx_audio.tts.utils import load_model
from IPython.display import Audio
import soundfile as sf

# Initialize the model
model_id = 'prince-canuma/Kokoro-82M'
model = load_model(model_id)

# Create a pipeline with American English
pipeline = KokoroPipeline(lang_code='a', model=model, repo_id=model_id)

# Generate audio
text = "The MLX King lives. Let him cook!"
for _, _, audio in pipeline(text, voice='af_heart', speed=1, split_pattern=r'\n+'):
    # Display audio in notebook (if applicable)
    display(Audio(data=audio, rate=24000, autoplay=0))

    # Save audio to file
    sf.write('audio.wav', audio[0], 24000)

Language Options

🇺🇸 'a' - American English
🇬🇧 'b' - British English
🇯🇵 'j' - Japanese (requires pip install misaki[ja])
🇨🇳 'z' - Mandarin Chinese (requires pip install misaki[zh])

CSM (Conversational Speech Model)

CSM is a model from Sesame that allows you text-to-speech and to customize voices using reference audio samples.

Example Usage

# Generate speech using CSM-1B model with reference audio
python -m mlx_audio.tts.generate --model mlx-community/csm-1b --text "Hello from Sesame." --play --ref_audio ./conversational_a.wav

You can pass any audio to clone the voice from or download sample audio file from here.

Advanced Features

Quantization

You can quantize models for improved performance:

from mlx_audio.tts.utils import quantize_model, load_model
import json
import mlx.core as mx

model = load_model(repo_id='prince-canuma/Kokoro-82M')
config = model.config

# Quantize to 8-bit
group_size = 64
bits = 8
weights, config = quantize_model(model, config, group_size, bits)

# Save quantized model
with open('./8bit/config.json', 'w') as f:
    json.dump(config, f)

mx.save_safetensors("./8bit/kokoro-v1_0.safetensors", weights, metadata={"format": "mlx"})

Requirements

MLX
Python 3.8+
Apple Silicon Mac (for optimal performance)
For the web interface and API:
- FastAPI
- Uvicorn

License

MIT License

Acknowledgements

Thanks to the Apple MLX team for providing a great framework for building TTS and STS models.
This project uses the Kokoro model architecture for text-to-speech synthesis.
The 3D visualization uses Three.js for rendering.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.4.2

Mar 30, 2026

0.4.1

Mar 14, 2026

0.4.0

Mar 7, 2026

0.3.1

Jan 29, 2026

0.3.0

Jan 25, 2026

0.3.0rc1 pre-release

Jan 22, 2026

0.2.10

Jan 6, 2026

0.2.9

Dec 20, 2025

0.2.8

Dec 17, 2025

0.2.7

Dec 16, 2025

0.2.6

Nov 7, 2025

0.2.5

Aug 26, 2025

0.2.4

Aug 18, 2025

0.2.3

May 24, 2025

0.2.2

May 19, 2025

This version

0.2.1

May 11, 2025

0.2.0

May 10, 2025

0.1.0

Apr 26, 2025

0.0.4

Apr 11, 2025

0.0.3

Mar 21, 2025

0.0.2

Mar 7, 2025

0.0.1

Feb 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_audio-0.2.1.tar.gz (962.0 kB view details)

Uploaded May 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mlx_audio-0.2.1-py3-none-any.whl (1.0 MB view details)

Uploaded May 11, 2025 Python 3

File details

Details for the file mlx_audio-0.2.1.tar.gz.

File metadata

Download URL: mlx_audio-0.2.1.tar.gz
Upload date: May 11, 2025
Size: 962.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for mlx_audio-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`80f46d40f4cf1c1fbc24e0f04ad73650077c6435b901e6a6706471fbaf6f7058`
MD5	`80df3b42ae3cb91978cfd9678ab31deb`
BLAKE2b-256	`3f26e66968eaaaef2cc0a07d0609390512da335b1e31fb2bf78db6ff698e982b`

See more details on using hashes here.

File details

Details for the file mlx_audio-0.2.1-py3-none-any.whl.

File metadata

Download URL: mlx_audio-0.2.1-py3-none-any.whl
Upload date: May 11, 2025
Size: 1.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.22

File hashes

Hashes for mlx_audio-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8aa8c9cdb4ecba2a93a9e69a84ac5ee09282d3a4d3eac0cd761ef81805085b60`
MD5	`867d44669cb7d00b4fc9bd589cb84d5f`
BLAKE2b-256	`847361a14c0af6eeea775ef47e07652bbf5321e14ff97db636c38bbb3345deec`

See more details on using hashes here.

mlx-audio 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

MLX-Audio

Features

Installation

Quick Start

How to call from python

Web Interface & API Server

Features

API Endpoints

Models

Kokoro

Example Usage

Language Options

CSM (Conversational Speech Model)

Example Usage

Advanced Features

Quantization

Requirements

License

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes