Skip to main content

Add your description here

Project description

Inference for Speech Models in MLX

This repo implements some speech models in MLX for better performance on Mac devices.

Currently, it supports the following models:

  • Qwen2.5-Omni (both original version and mlx-quant version), currently supported only speech model
  • Ultravox-0.5

Performance

Tested on MacBook M4-Pro (48GB RAM):

Model Prompt TPS Generation TPS
Qwen/Qwen2.5-Omni-7B 259.5 17.8
Qwen/Qwen2.5-Omni-3B 468.4 38.8
giangndm/qwen2.5-omni-7b-mlx-4bit 259.2 57.6
giangndm/qwen2.5-omni-3b-mlx-8bit 456.2 67.0
fixie-ai/ultravox-v0_5-llama-3_1-8b and mlx-community/Llama-3.1-8B-Instruct-4bit 188.5 tps 40.4 tps

How to use

uv add https://github.com/giangndm/mlx-lm-omni.git

For Qwen2.5-Omni

from mlx_lm_omni import load, generate
import librosa
from io import BytesIO
from urllib.request import urlopen

model, tokenizer = load("giangndm/qwen2.5-omni-7b-mlx-4bit")

audio_path = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac"
audio = librosa.load(BytesIO(urlopen(audio_path).read()), sr=16000)[0]

messages = [
    {"role": "system", "content": [{"type": "text", "text": "You are a speech recognition model."}]},
    {"role": "user", "content": [
        {"type": "audio", "audio": audio},
        {"type": "text", "text": "Transcribe the English audio into text without any punctuation marks."},
    ]},
]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

For Ultravox 0.5

from mlx_lm_omni import load, generate
import librosa
from io import BytesIO
from urllib.request import urlopen

model, tokenizer = load("fixie-ai/ultravox-v0_5-llama-3_1-8b", model_config={"text_model_id": "mlx-community/Llama-3.1-8B-Instruct-4bit"})

audio_path = "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen2-Audio/audio/1272-128104-0000.flac"
audio = librosa.load(BytesIO(urlopen(audio_path).read()), sr=16000)[0]

messages = [
    {"role": "system", "content": "You are a speech recognition model."},
    {"role": "user", "content": "Transcribe the English audio into text without any punctuation marks.", "audio": audio},
]
prompt = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True
)

text = generate(model, tokenizer, prompt=prompt, verbose=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_lm_omni-0.1.0.tar.gz (62.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_lm_omni-0.1.0-py3-none-any.whl (17.2 kB view details)

Uploaded Python 3

File details

Details for the file mlx_lm_omni-0.1.0.tar.gz.

File metadata

  • Download URL: mlx_lm_omni-0.1.0.tar.gz
  • Upload date:
  • Size: 62.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for mlx_lm_omni-0.1.0.tar.gz
Algorithm Hash digest
SHA256 35f557d614a0a9ebbd28d6f338fb3a6de6a03707e64d06d1d0e13fb68a998fbb
MD5 0142f07c016215f519c4c40acb2b01ab
BLAKE2b-256 eee483130c156f7e293d97aa167a2451b7ac14951092a05c57e1193a0c5be856

See more details on using hashes here.

File details

Details for the file mlx_lm_omni-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mlx_lm_omni-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1396004342026778dcc66c2c79390639c5e5c05eeb0e11fd43ed326758e71fc2
MD5 22fc3088894e5021bc6c9da5f442af71
BLAKE2b-256 e9fd225f8420609e05ecb211e756dec52a968ae10d07f82ea47019cc6d3026e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page