Skip to main content

Moshi is moshi, but running on macOS

Project description

Moshi - MLX

See the top-level README.md for more information on Moshi.

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi operates at a framerate of 12.5 Hz, and compresses 24 kHz audio down to 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codec.

This is the MLX implementation for Moshi. For Mimi, this uses our Rust based implementation through the Python binding provided in rustymimi, available in the rust/ folder of our main repository.

Requirements

You will need at least Python 3.10, we recommend Python 3.12.

pip install moshi_mlx  # moshi MLX, from PyPI, best with Python 3.12.
# Or the bleeding edge versions for Moshi and Moshi-MLX.
pip install -e "git+https://git@github.com/kyutai-labs/moshi#egg=moshi_mlx&subdirectory=moshi_mlx"

We have tested the MLX version with MacBook Pro M3.

If you are not using Python 3.12, you might get an error when installing moshi_mlx or rustymimi (which moshi_mlx depends on). Then,you will need to install the Rust toolchain, or switch to Python 3.12.

Usage

Once you have installed moshi_mlx, you can run

python -m moshi_mlx.local -q 4   # weights quantized to 4 bits
python -m moshi_mlx.local -q 8   # weights quantized to 8 bits
# And using a different pretrained model:
python -m moshi_mlx.local -q 4 --hf-repo kyutai/moshika-mlx-q4
python -m moshi_mlx.local -q 8 --hf-repo kyutai/moshika-mlx-q8
# be careful to always match the `-q` and `--hf-repo` flag.

This uses a command line interface, which is barebone. It does not perform any echo cancellation, nor does it try to compensate for a growing lag by skipping frames.

You can use --hf-repo to select a different pretrained model, by setting the proper Hugging Face repository. See the model list for a reference of the available models.

Alternatively you can use python -m moshi_mlx.local_web to use the web UI, the connection is via http, at localhost:8998.

License

The present code is provided under the MIT license.

Citation

If you use either Mimi or Moshi, please cite the following paper,

@techreport{kyutai2024moshi,
    author = {Alexandre D\'efossez and Laurent Mazar\'e and Manu Orsini and Am\'elie Royer and
			  Patrick P\'erez and Herv\'e J\'egou and Edouard Grave and Neil Zeghidour},
    title = {Moshi: a speech-text foundation model for real-time dialogue},
    institution = {Kyutai},
    year={2024},
    month={September},
    url={http://kyutai.org/Moshi.pdf},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moshi_mlx-0.3.0.tar.gz (46.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moshi_mlx-0.3.0-py3-none-any.whl (55.8 kB view details)

Uploaded Python 3

File details

Details for the file moshi_mlx-0.3.0.tar.gz.

File metadata

  • Download URL: moshi_mlx-0.3.0.tar.gz
  • Upload date:
  • Size: 46.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for moshi_mlx-0.3.0.tar.gz
Algorithm Hash digest
SHA256 108f0812dfa248cd543de9a46ec11bb2d603387ec3fc92d171157875c783971e
MD5 37da43dd163534a0aa48f459f2e9e8c0
BLAKE2b-256 3404173cf1fc12f73cfc49321e64b45ea1ea15143b49ef570b7171906a45b5a7

See more details on using hashes here.

File details

Details for the file moshi_mlx-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: moshi_mlx-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 55.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for moshi_mlx-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2ec210340df8bf34025df6c272ea3ed618f5c22e176c3513f5dba4e529420621
MD5 cf4ff33990e63636a1f742ae7bf996e5
BLAKE2b-256 41753c00e7392a7ea80887440da615c300679636f9e6f89da10e1082f4970a95

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page