Skip to main content

MusicGen + AudioGen on Apple Silicon via MLX — full AudioCraft for M-series Macs, no CUDA needed

Project description

mlx-audiocraft

MusicGen + AudioGen on Apple Silicon via MLX — the first full AudioCraft port for M-series Macs. No CUDA, no server, no Docker. Just pip install and generate.

# Generate sound effects (NEW — not in any other MLX package)
audiogen-mlx "keyboard typing, office ambience" -d 5 -o sfx.wav

# Generate music
musicgen-mlx "upbeat cinematic tech promo, piano, 120 BPM, no vocals" -d 30 -o music.wav

What's unique about this package

Package MusicGen AudioGen (SFX) MLX (Apple GPU)
audiocraft (Meta)
musicgen-mlx
mlx-audiocraft (this)

AudioGen on MLX is new. This is the first port that brings text-to-sound-effects to Apple Silicon with hardware acceleration.


How it works (for learners)

Here's the mental model you need:

Your text prompt
      ↓
  T5 Encoder          ← reads your text, runs on CPU (PyTorch)
      ↓
  Transformer LM      ← generates audio "tokens", runs on MLX (Apple GPU)
      ↓
  EnCodec Decoder     ← turns tokens into a waveform, runs on MLX
      ↓
  WAV file

MusicGen and AudioGen use the exact same pipeline — they're just trained on different data (music vs. sound effects). This is why porting AudioGen was mostly adding audiogen.py that inherits the whole pipeline.

MLX is Apple's own ML framework optimised for the unified memory in M-series chips — the CPU, GPU, and Neural Engine all share the same RAM, so there's zero data transfer overhead between them.


Install

pip install mlx-audiocraft

Requirements: macOS 13+, Apple Silicon (M1/M2/M3/M4), Python 3.10+


Quick start

Sound effects (AudioGen)

from mlx_audiocraft import AudioGen

model = AudioGen.get_pretrained("facebook/audiogen-medium")
model.set_generation_params(duration=5)

wav = model.generate(["dog barking in a park"])
# wav shape: [batch, channels, samples]

Music (MusicGen)

from mlx_audiocraft import MusicGen

model = MusicGen.get_pretrained("facebook/musicgen-small")
model.set_generation_params(duration=30)

wav = model.generate(["calm lo-fi beat, soft piano, vinyl crackle"])

CLI

# Sound effects
audiogen-mlx "crowd applause, conference room" -d 5
audiogen-mlx "rain on a window, thunder in the distance" -d 8 -o rain.wav

# Music
musicgen-mlx "epic orchestral soundtrack" -m facebook/musicgen-large -d 20
musicgen-mlx "funky disco groove" "ambient pad wide reverb" -d 10  # batch

Models

AudioGen

Model Size Download Sample Rate
facebook/audiogen-medium 1.5B ~3.6 GB 16 kHz

MusicGen

Model Size Download Sample Rate
facebook/musicgen-small 300M ~1.2 GB 32 kHz
facebook/musicgen-medium 1.5B ~3.2 GB 32 kHz
facebook/musicgen-large 3.3B ~6.5 GB 32 kHz
facebook/musicgen-stereo-small 300M ~1.2 GB 32 kHz stereo
facebook/musicgen-stereo-medium 1.5B ~3.2 GB 32 kHz stereo

Models download automatically from HuggingFace on first use and are cached in ~/.cache/huggingface/.


Benchmark (M4 Max, 64 GB)

Run python benchmarks/run_benchmarks.py to generate results for your machine.

Model Duration Time Realtime
audiogen-medium 5s ~8s 0.6x
musicgen-small 10s ~8s 1.3x
musicgen-medium 10s ~17s 0.6x
musicgen-large 10s ~35s 0.3x

Faster than realtime means generation is quicker than the audio duration.


Prompt guide

Sound effects (AudioGen)

Be literal and specific:

"keyboard typing, subtle office background noise"
"notification chime, clean and bright"
"crowd applause, conference room, 3 seconds"
"rain falling on a metal roof, distant thunder"
"coffee machine brewing, kitchen ambience"

Music (MusicGen)

Include style, instrumentation, BPM, and always end with , no vocals:

"upbeat cinematic tech promo, clean piano with electronic pads, 120 BPM, no vocals"
"calm educational background, soft piano and ambient pads, 75 BPM, no vocals"
"energetic SaaS launch, modern synths, punchy drums, 120 BPM, no vocals"
"Hindi classical influence, sitar and tabla, meditative, 60 BPM, no vocals"

Save output

import numpy as np
import soundfile as sf

wav = model.generate(["your prompt"])
audio = np.array(wav[0]).T          # [channels, samples] → [samples, channels]
if audio.ndim == 2 and audio.shape[1] == 1:
    audio = audio[:, 0]             # stereo → mono if needed
sf.write("output.wav", audio, model.sample_rate)

Architecture deep-dive (for learners)

If you want to understand how this works under the hood, here's a reading order:

  1. mlx_audiocraft/models/genmodel.pyBaseGenModel — the base class all models inherit. Understand generate(), _prepare_tokens_and_attributes(), and _generate_tokens().

  2. mlx_audiocraft/models/audiogen.py — our AudioGen port. It's tiny (~90 lines) because it just inherits BaseGenModel and points at AudioGen's weights. Good first file to read.

  3. mlx_audiocraft/models/musicgen.py — MusicGen adds melody conditioning on top of BaseGenModel. Compare with audiogen.py to see the diff.

  4. mlx_audiocraft/models/loaders.py — how model weights are downloaded from HuggingFace and converted from PyTorch format to MLX.

  5. mlx_audiocraft/modules/transformer.py — the MLX transformer implementation. This is the core of the language model.

  6. mlx_audiocraft/models/encodec.py — the audio codec (compress waveform → tokens, decode tokens → waveform).


Attribution

The MusicGen MLX engine is based on musicgen-mlx by Andrade Olivier. The original AudioCraft library is by Meta AI Research.

AudioGen MLX port is original work in this repository.


License

MIT — see LICENSE.

The pre-trained model weights (facebook/audiogen-medium, facebook/musicgen-*) are released under the CC-BY-NC 4.0 licence by Meta.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_audiocraft-0.1.0.tar.gz (53.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlx_audiocraft-0.1.0-py3-none-any.whl (65.4 kB view details)

Uploaded Python 3

File details

Details for the file mlx_audiocraft-0.1.0.tar.gz.

File metadata

  • Download URL: mlx_audiocraft-0.1.0.tar.gz
  • Upload date:
  • Size: 53.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlx_audiocraft-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dc3e7a90b542661a66dc54f38832dc1f6dd07ce6dea42279ae6986cc76144451
MD5 7285c68439e00f6cffe325d45a2abb53
BLAKE2b-256 5303c80492e75a98a9c8dcc247b8bd5ca32b6ae7db2d0f5849f8b1287c885740

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_audiocraft-0.1.0.tar.gz:

Publisher: publish.yml on theashishmaurya/mlx-audiocraft

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlx_audiocraft-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mlx_audiocraft-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 65.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mlx_audiocraft-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0851dcc63b707ba681a498359d599c8d855ee02e0ee7ebd2e2e2864c204b801
MD5 f740fce5505adbf4d583cc58f0e98bfc
BLAKE2b-256 0692d11b5f9d06d2209d50665ff3005810a29fb6cdd4d0e5767664c3ab3edf0c

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_audiocraft-0.1.0-py3-none-any.whl:

Publisher: publish.yml on theashishmaurya/mlx-audiocraft

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page