Skip to main content

MLX-native inference for MERaLiON AudioLLM on Apple Silicon

Project description

mlx-meralion

CI

MLX-native inference for MERaLiON AudioLLM on Apple Silicon.

MERaLiON is A*STAR's multimodal audio-language model for speech transcription, translation, spoken question answering, and more.

Installation

pip install mlx-meralion

Requires macOS on Apple Silicon (M1/M2/M3/M4) and Python 3.10+.

Quick Start

Python API

from mlx_meralion import load_model, transcribe

# Load model (auto-downloads from HuggingFace on first use)
model = load_model("MERaLiON/MERaLiON-2-10B-MLX")  # 10B 8-bit, recommended
# model = load_model("MERaLiON/MERaLiON-2-3B-MLX")   # 3B fp16, smaller

# Transcribe speech
text = transcribe(model, "audio.wav")
print(text)

# Translate to Chinese
text = transcribe(model, "audio.wav", task="translate_zh")

# Spoken question answering
text = transcribe(model, "audio.wav", task="sqa", question="What is the speaker talking about?")

Batch Inference

Process multiple audio files with GPU-batched decoding for higher throughput:

from mlx_meralion import load_model, batch_transcribe

model = load_model("MERaLiON/MERaLiON-2-10B-MLX")

results = batch_transcribe(model, ["a.wav", "b.wav", "c.wav", "d.wav"])
for text in results:
    print(text)

CLI

mlx-meralion --model MERaLiON/MERaLiON-2-10B-MLX --audio audio.wav --task asr
mlx-meralion --model MERaLiON/MERaLiON-2-10B-MLX --audio audio.wav --task translate_zh

Supported Tasks

Task Description
asr Speech-to-text transcription
translate_zh Translate to Chinese
translate_id Translate to Indonesian
translate_ms Translate to Malay
translate_ta Translate to Tamil
sqa Spoken question answering (requires question=)
summarize Dialogue summarization
paralinguistics Speaker characteristic analysis

Available Models

Model Size RAM Quality HuggingFace
MERaLiON-2-10B-MLX ~10 GB 16+ GB Best MERaLiON/MERaLiON-2-10B-MLX
MERaLiON-2-3B-MLX ~6 GB 8+ GB Good MERaLiON/MERaLiON-2-3B-MLX

Batch Inference Benchmark

Measured on Apple M4 Pro (24 GB) with MERaLiON-2-10B-MLX, 8 audio samples (~25s each), max 256 tokens. Correctness validated by comparing batch outputs against sequential outputs token-for-token.

Method B Time Throughput Speedup Correct
Sequential (for-loop) 8 38.96s 0.21 aud/s 1.00x ---
batch_transcribe 4 13.93s 0.29 aud/s 1.40x PASS
batch_transcribe 8 23.25s 0.34 aud/s 1.68x PASS

Features

  • Apple Silicon native: Runs entirely on MLX with GPU acceleration
  • Batch inference: GPU-batched decoding via BatchKVCache for multi-audio throughput
  • N-gram blocking: Prevents repetitive output (matching HuggingFace quality)
  • Smart chunking: Long audio split at 30s boundaries; short tails merged to prevent hallucination
  • Auto-download: Models downloaded and cached from HuggingFace automatically

Architecture

Audio (WAV/MP3/FLAC)
  -> Whisper Encoder (1280-d)
    -> LayerNorm + MLP Adaptor
      -> Speech embeddings merged into text sequence
        -> Gemma2 Decoder -> text output

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_meralion-0.2.0.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_meralion-0.2.0-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file mlx_meralion-0.2.0.tar.gz.

File metadata

  • Download URL: mlx_meralion-0.2.0.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_meralion-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a2fd6660bfb26d4b41324c2c55b733cfdc6e6e70a382fd83995d06b73e63ab33
MD5 324fab1512a6559f80eb3fccfb89654c
BLAKE2b-256 d049be127e33b845aa88bc7ed45d2f72351244fc3fc84dc3ef8d55ed1d4e3f62

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_meralion-0.2.0.tar.gz:

Publisher: publish.yml on YingxuH/mlx-audiollm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlx_meralion-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mlx_meralion-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_meralion-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0c7d9632c362066d35f6d23c332a048baaea69e3160f49a3aaf1b4720d2fb42f
MD5 904c6a063f5bb564a8771c02cc32e1f4
BLAKE2b-256 2244c9810b1522d9d0fb2fe2b2766ccea43a1c22c2f651947ce2e2519b179059

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_meralion-0.2.0-py3-none-any.whl:

Publisher: publish.yml on YingxuH/mlx-audiollm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page