Skip to main content

Moshi is moshi, but running on macOS

Project description

Moshi - MLX

See the top-level README.md for more information on Moshi.

Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. Mimi operates at a framerate of 12.5 Hz, and compresses 24 kHz audio down to 1.1 kbps, in a fully streaming manner (latency of 80ms, the frame size), yet performs better than existing, non-streaming, codec.

This is the MLX implementation for Moshi. For Mimi, this uses our Rust based implementation through the Python binding provided in rustymimi, available in the rust/ folder of our main repository.

Requirements

You will need at least Python 3.10, we recommend Python 3.12.

pip install moshi_mlx  # moshi MLX, from PyPI, best with Python 3.12.
# Or the bleeding edge versions for Moshi and Moshi-MLX.
pip install -e "git+https://git@github.com/kyutai-labs/moshi#egg=moshi_mlx&subdirectory=moshi_mlx"

We have tested the MLX version with MacBook Pro M3.

If you are not using Python 3.12, you might get an error when installing moshi_mlx or rustymimi (which moshi_mlx depends on). Then,you will need to install the Rust toolchain, or switch to Python 3.12.

Usage

Once you have installed moshi_mlx, you can run

python -m moshi_mlx.local -q 4   # weights quantized to 4 bits
python -m moshi_mlx.local -q 8   # weights quantized to 8 bits
# And using a different pretrained model:
python -m moshi_mlx.local -q 4 --hf-repo kyutai/moshika-mlx-q4
python -m moshi_mlx.local -q 8 --hf-repo kyutai/moshika-mlx-q8
# be careful to always match the `-q` and `--hf-repo` flag.

This uses a command line interface, which is barebone. It does not perform any echo cancellation, nor does it try to compensate for a growing lag by skipping frames.

You can use --hf-repo to select a different pretrained model, by setting the proper Hugging Face repository. See the model list for a reference of the available models.

Alternatively you can use python -m moshi_mlx.local_web to use the web UI, the connection is via http, at localhost:8998.

License

The present code is provided under the MIT license.

Citation

If you use either Mimi or Moshi, please cite the following paper,

@techreport{kyutai2024moshi,
    author = {Alexandre D\'efossez and Laurent Mazar\'e and Manu Orsini and Am\'elie Royer and
			  Patrick P\'erez and Herv\'e J\'egou and Edouard Grave and Neil Zeghidour},
    title = {Moshi: a speech-text foundation model for real-time dialogue},
    institution = {Kyutai},
    year={2024},
    month={September},
    url={http://kyutai.org/Moshi.pdf},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moshi_mlx-0.2.2.tar.gz (30.7 kB view details)

Uploaded Source

Built Distribution

moshi_mlx-0.2.2-py3-none-any.whl (38.1 kB view details)

Uploaded Python 3

File details

Details for the file moshi_mlx-0.2.2.tar.gz.

File metadata

  • Download URL: moshi_mlx-0.2.2.tar.gz
  • Upload date:
  • Size: 30.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for moshi_mlx-0.2.2.tar.gz
Algorithm Hash digest
SHA256 15fb4f15a9c6e48005cd9aacdf440c5f67d085063dfeb4b589e4276ca1cebca5
MD5 86d0bdee585357415bc938c5a6bc49c4
BLAKE2b-256 f53f6877231ccf7a09a4fabee1f358e6a60d4a11fd2220191e6eb3f1570bf352

See more details on using hashes here.

File details

Details for the file moshi_mlx-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: moshi_mlx-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 38.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.5

File hashes

Hashes for moshi_mlx-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c4478ccf2e94ed6c8c42b5133841c7991ce6915835a5362b07d38a0c7509a54e
MD5 cb490b40eac3be1f7004ec90db3cb041
BLAKE2b-256 2af74798e23090c7885cb097e74a48b7ce7beced370875260b713e8e89c9526b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page