Skip to main content

Moshi is moshi, but running on macOS

Project description

Moshi - MLX

See the top-level README.md for more information on Moshi.

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi operates at a framerate of 12.5 Hz, and compresses 24 kHz audio down to 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codec.

This is the MLX implementation for Moshi. For Mimi, this uses our Rust based implementation through the Python binding provided in rustymimi, available in the rust/ folder of our main repository.

Requirements

You will need at least Python 3.10, we recommend Python 3.12.

pip install moshi_mlx  # moshi MLX, from PyPI, best with Python 3.12.
# Or the bleeding edge versions for Moshi and Moshi-MLX.
pip install -e "git+https://git@github.com/kyutai-labs/moshi#egg=moshi_mlx&subdirectory=moshi_mlx"

We have tested the MLX version with MacBook Pro M3.

If you are not using Python 3.12, you might get an error when installing moshi_mlx or rustymimi (which moshi_mlx depends on). Then,you will need to install the Rust toolchain, or switch to Python 3.12.

Usage

Once you have installed moshi_mlx, you can run

python -m moshi_mlx.local -q 4   # weights quantized to 4 bits
python -m moshi_mlx.local -q 8   # weights quantized to 8 bits
# And using a different pretrained model:
python -m moshi_mlx.local -q 4 --hf-repo kyutai/moshika-mlx-q4
python -m moshi_mlx.local -q 8 --hf-repo kyutai/moshika-mlx-q8
# be careful to always match the `-q` and `--hf-repo` flag.

This uses a command line interface, which is barebone. It does not perform any echo cancellation, nor does it try to compensate for a growing lag by skipping frames.

You can use --hf-repo to select a different pretrained model, by setting the proper Hugging Face repository. See the model list for a reference of the available models.

Alternatively you can use python -m moshi_mlx.local_web to use the web UI, the connection is via http, at localhost:8998.

License

The present code is provided under the MIT license.

Citation

If you use either Mimi or Moshi, please cite the following paper,

@techreport{kyutai2024moshi,
    author = {Alexandre D\'efossez and Laurent Mazar\'e and Manu Orsini and Am\'elie Royer and
			  Patrick P\'erez and Herv\'e J\'egou and Edouard Grave and Neil Zeghidour},
    title = {Moshi: a speech-text foundation model for real-time dialogue},
    institution = {Kyutai},
    year={2024},
    month={September},
    url={http://kyutai.org/Moshi.pdf},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moshi_mlx-0.2.6.tar.gz (30.8 kB view details)

Uploaded Source

Built Distribution

moshi_mlx-0.2.6-py3-none-any.whl (38.3 kB view details)

Uploaded Python 3

File details

Details for the file moshi_mlx-0.2.6.tar.gz.

File metadata

  • Download URL: moshi_mlx-0.2.6.tar.gz
  • Upload date:
  • Size: 30.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for moshi_mlx-0.2.6.tar.gz
Algorithm Hash digest
SHA256 bc96ab3f0b77c00679c3f369d2f081e1ddb33ffe1a8909f439c92f92d97564ef
MD5 e6326aec8152ef02164d489b56b19f17
BLAKE2b-256 2a09b2de7df66ef15921601041f8ea8cd2c278bf9776f862c4ce2c19e072bb21

See more details on using hashes here.

File details

Details for the file moshi_mlx-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: moshi_mlx-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 38.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for moshi_mlx-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 3b2727d17c67bb6629646ba9b0d765086bafeddfab5fb21110b9655bf81157c5
MD5 b243c5aae48c6673395c164030db2311
BLAKE2b-256 44fd913498a8cf6ca45c9cb923758816dc02c6f8b61944e1223738d18e0f9081

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page