Skip to main content

MLX backend for pocket-tts with Apple Silicon optimization

Project description

pocket-tts-mlx

MLX backend for pocket-tts optimized for Apple Silicon.

Runtime is torch-free. Torch is only required for optional parity tests.

Installation

PyPI install:

pip install pocket-tts-mlx

Local development:

pip install -e .

Model weights are downloaded from Hugging Face on first run. For voice cloning weights, accept the model terms and authenticate:

hf auth login

Quickstart

from pocket_tts_mlx import TTSModel

model = TTSModel.load_model()
state = model.get_state_for_audio_prompt("marius")
audio = model.generate_audio(
    state,
    "Hello from MLX!",
    max_tokens=200,
    warmup_frames=1,
    trim_start_ms=40,
    fade_in_ms=15,
)

CLI

Basic usage:

pocket-tts-mlx "Hello, world!" --voice marius --output output.wav

Cleaner onset (recommended if startup artifacts are audible):

pocket-tts-mlx "Hello, world!" --voice marius --output output.wav --warmup-frames 1 --trim-start-ms 40 --fade-in-ms 15

Onset Cleanup Options

  • --warmup-frames: decode and discard initial Mimi frames to reduce decoder startup transients.
  • --trim-start-ms: trim milliseconds from start of output.
  • --fade-in-ms: apply linear fade-in at start.

Equivalent Python args are warmup_frames, trim_start_ms, and fade_in_ms.

Performance Note

generate_audio() now materializes generated chunks before returning, so np.array(audio) overhead should be near zero in normal usage.

Voices

Predefined voices:

  • alba
  • marius
  • javert
  • jean
  • fantine
  • cosette
  • eponine
  • azelma

Requirements

  • Python 3.10+
  • Apple Silicon Mac (M1/M2/M3/M4)
  • MLX
  • Internet access for initial model downloads

Notes

  • Voice cloning requires Hugging Face access to kyutai/pocket-tts.
  • Non-voice-cloning weights are used automatically when voice cloning is unavailable.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pocket_tts_mlx-0.2.1.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pocket_tts_mlx-0.2.1-py3-none-any.whl (39.1 kB view details)

Uploaded Python 3

File details

Details for the file pocket_tts_mlx-0.2.1.tar.gz.

File metadata

  • Download URL: pocket_tts_mlx-0.2.1.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pocket_tts_mlx-0.2.1.tar.gz
Algorithm Hash digest
SHA256 c2cad3e52764b16cb5d524e3524a8ea3b89d385d1348ea4f2b4c5a14beb2a32f
MD5 32e1cee5f64a8b10c9a0f1a01b1859e8
BLAKE2b-256 8c160af5ff09328a3a3f3a9e43e93be7a3cd3a83596b0cc233924837de9d9dad

See more details on using hashes here.

File details

Details for the file pocket_tts_mlx-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: pocket_tts_mlx-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 39.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for pocket_tts_mlx-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 60b8f2753b5cea523ce9fb909959812742f200c88ee9535f92930b3eaa15ea67
MD5 da9dd6a63d449d230302fc8422552262
BLAKE2b-256 fea79ebff3ef6292a6f45314c364251847d67515f3ba2ec8ea1fef1133882b63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page