Skip to main content

MLX-native inference for MERaLiON AudioLLM on Apple Silicon

Project description

mlx-meralion

MLX-native inference for MERaLiON AudioLLM on Apple Silicon.

MERaLiON is A*STAR's multimodal audio-language model for speech transcription, translation, spoken question answering, and more.

Installation

pip install mlx-meralion

Requires macOS on Apple Silicon (M1/M2/M3/M4) and Python 3.10+.

Quick Start

Python API

from mlx_meralion import load_model, transcribe

# Load model (auto-downloads from HuggingFace on first use)
model = load_model("MERaLiON/MERaLiON-2-10B-MLX")  # 10B 8-bit, recommended
# model = load_model("MERaLiON/MERaLiON-2-3B-MLX")   # 3B fp16, smaller

# Transcribe speech
text = transcribe(model, "audio.wav")
print(text)

# Translate to Chinese
text = transcribe(model, "audio.wav", task="translate_zh")

# Spoken question answering
text = transcribe(model, "audio.wav", task="sqa", question="What is the speaker talking about?")

# Summarize dialogue
text = transcribe(model, "audio.wav", task="summarize")

CLI

# ASR (default task)
mlx-meralion --model MERaLiON/MERaLiON-2-10B-MLX --audio audio.wav --task asr

# Translation
mlx-meralion --model MERaLiON/MERaLiON-2-10B-MLX --audio audio.wav --task translate_zh

# Custom instruction
mlx-meralion --model MERaLiON/MERaLiON-2-10B-MLX --audio audio.wav --instruction "Summarize this in one sentence."

Supported Tasks

Task Description
asr Speech-to-text transcription
translate_zh Translate to Chinese
translate_id Translate to Indonesian
translate_ms Translate to Malay
translate_ta Translate to Tamil
sqa Spoken question answering (requires question=)
summarize Dialogue summarization
paralinguistics Speaker characteristic analysis

Available Models

Model Size RAM Quality HuggingFace
MERaLiON-2-10B-MLX ~10 GB 16+ GB Best MERaLiON/MERaLiON-2-10B-MLX
MERaLiON-2-3B-MLX ~6 GB 8+ GB Good MERaLiON/MERaLiON-2-3B-MLX

Features

  • Apple Silicon native: Runs entirely on MLX with GPU acceleration
  • N-gram blocking: Automatically prevents repetitive output (matching HuggingFace quality)
  • Smart chunking: Long audio split at 30s boundaries; short tails merged to prevent hallucination
  • Auto-download: HuggingFace models are downloaded and cached automatically
  • Multiple tasks: ASR, translation, QA, summarization, and more

Architecture

Audio (WAV/MP3/FLAC)
  -> Whisper Encoder (1280-d)
    -> LayerNorm + MLP Adaptor
      -> Speech embeddings merged into text sequence
        -> Gemma2 Decoder -> text output

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_meralion-0.1.0.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_meralion-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file mlx_meralion-0.1.0.tar.gz.

File metadata

  • Download URL: mlx_meralion-0.1.0.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_meralion-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e791b72996e71a325a3a9fd6d67043193a716cf9c1e7d183c7a08c13a65e7f92
MD5 d2dd57aff22a3674c0d428f3fd6aa8a3
BLAKE2b-256 8f4e078c2a15d8640a4cc6533019d2dec0b14993b38abfd73315cf0a66dba30c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_meralion-0.1.0.tar.gz:

Publisher: publish.yml on YingxuH/mlx-audiollm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlx_meralion-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlx_meralion-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_meralion-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b39aade3ff7bdb1333237088d43a900709d4faf26f82d2a2c244e89a245ad2d6
MD5 89246ec48a1cbdd09bb28b1b7c958462
BLAKE2b-256 b346bec76344432e10d873ff2825acbc50c7fce3ed1f24f9cea84ca0c571a392

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_meralion-0.1.0-py3-none-any.whl:

Publisher: publish.yml on YingxuH/mlx-audiollm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page