MusicGen + AudioGen on Apple Silicon via MLX — full AudioCraft for M-series Macs, no CUDA needed
Project description
mlx-audiocraft
MusicGen + AudioGen on Apple Silicon via MLX — the first full AudioCraft port for M-series Macs. No CUDA, no server, no Docker. Just pip install and generate.
# Generate sound effects (NEW — not in any other MLX package)
audiogen-mlx "keyboard typing, office ambience" -d 5 -o sfx.wav
# Generate music
musicgen-mlx "upbeat cinematic tech promo, piano, 120 BPM, no vocals" -d 30 -o music.wav
What's unique about this package
| Package | MusicGen | AudioGen (SFX) | MLX (Apple GPU) |
|---|---|---|---|
audiocraft (Meta) |
✅ | ✅ | ❌ |
musicgen-mlx |
✅ | ❌ | ✅ |
mlx-audiocraft (this) |
✅ | ✅ | ✅ |
AudioGen on MLX is new. This is the first port that brings text-to-sound-effects to Apple Silicon with hardware acceleration.
How it works (for learners)
Here's the mental model you need:
Your text prompt
↓
T5 Encoder ← reads your text, runs on CPU (PyTorch)
↓
Transformer LM ← generates audio "tokens", runs on MLX (Apple GPU)
↓
EnCodec Decoder ← turns tokens into a waveform, runs on MLX
↓
WAV file
MusicGen and AudioGen use the exact same pipeline — they're just trained on different data (music vs. sound effects). This is why porting AudioGen was mostly adding audiogen.py that inherits the whole pipeline.
MLX is Apple's own ML framework optimised for the unified memory in M-series chips — the CPU, GPU, and Neural Engine all share the same RAM, so there's zero data transfer overhead between them.
Install
pip install mlx-audiocraft
Requirements: macOS 13+, Apple Silicon (M1/M2/M3/M4), Python 3.10+
Quick start
Sound effects (AudioGen)
from mlx_audiocraft import AudioGen
model = AudioGen.get_pretrained("facebook/audiogen-medium")
model.set_generation_params(duration=5)
wav = model.generate(["dog barking in a park"])
# wav shape: [batch, channels, samples]
Music (MusicGen)
from mlx_audiocraft import MusicGen
model = MusicGen.get_pretrained("facebook/musicgen-small")
model.set_generation_params(duration=30)
wav = model.generate(["calm lo-fi beat, soft piano, vinyl crackle"])
CLI
# Sound effects
audiogen-mlx "crowd applause, conference room" -d 5
audiogen-mlx "rain on a window, thunder in the distance" -d 8 -o rain.wav
# Music
musicgen-mlx "epic orchestral soundtrack" -m facebook/musicgen-large -d 20
musicgen-mlx "funky disco groove" "ambient pad wide reverb" -d 10 # batch
Models
AudioGen
| Model | Size | Download | Sample Rate |
|---|---|---|---|
facebook/audiogen-medium |
1.5B | ~3.6 GB | 16 kHz |
MusicGen
| Model | Size | Download | Sample Rate |
|---|---|---|---|
facebook/musicgen-small |
300M | ~1.2 GB | 32 kHz |
facebook/musicgen-medium |
1.5B | ~3.2 GB | 32 kHz |
facebook/musicgen-large |
3.3B | ~6.5 GB | 32 kHz |
facebook/musicgen-stereo-small |
300M | ~1.2 GB | 32 kHz stereo |
facebook/musicgen-stereo-medium |
1.5B | ~3.2 GB | 32 kHz stereo |
Models download automatically from HuggingFace on first use and are cached in ~/.cache/huggingface/.
Benchmark (M4 Max, 64 GB)
Run
python benchmarks/run_benchmarks.pyto generate results for your machine.
| Model | Duration | Time | Realtime |
|---|---|---|---|
| audiogen-medium | 5s | ~8s | 0.6x |
| musicgen-small | 10s | ~8s | 1.3x |
| musicgen-medium | 10s | ~17s | 0.6x |
| musicgen-large | 10s | ~35s | 0.3x |
Faster than realtime means generation is quicker than the audio duration.
Prompt guide
Sound effects (AudioGen)
Be literal and specific:
"keyboard typing, subtle office background noise"
"notification chime, clean and bright"
"crowd applause, conference room, 3 seconds"
"rain falling on a metal roof, distant thunder"
"coffee machine brewing, kitchen ambience"
Music (MusicGen)
Include style, instrumentation, BPM, and always end with , no vocals:
"upbeat cinematic tech promo, clean piano with electronic pads, 120 BPM, no vocals"
"calm educational background, soft piano and ambient pads, 75 BPM, no vocals"
"energetic SaaS launch, modern synths, punchy drums, 120 BPM, no vocals"
"Hindi classical influence, sitar and tabla, meditative, 60 BPM, no vocals"
Save output
import numpy as np
import soundfile as sf
wav = model.generate(["your prompt"])
audio = np.array(wav[0]).T # [channels, samples] → [samples, channels]
if audio.ndim == 2 and audio.shape[1] == 1:
audio = audio[:, 0] # stereo → mono if needed
sf.write("output.wav", audio, model.sample_rate)
Architecture deep-dive (for learners)
If you want to understand how this works under the hood, here's a reading order:
-
mlx_audiocraft/models/genmodel.py—BaseGenModel— the base class all models inherit. Understandgenerate(),_prepare_tokens_and_attributes(), and_generate_tokens(). -
mlx_audiocraft/models/audiogen.py— our AudioGen port. It's tiny (~90 lines) because it just inheritsBaseGenModeland points at AudioGen's weights. Good first file to read. -
mlx_audiocraft/models/musicgen.py— MusicGen adds melody conditioning on top ofBaseGenModel. Compare withaudiogen.pyto see the diff. -
mlx_audiocraft/models/loaders.py— how model weights are downloaded from HuggingFace and converted from PyTorch format to MLX. -
mlx_audiocraft/modules/transformer.py— the MLX transformer implementation. This is the core of the language model. -
mlx_audiocraft/models/encodec.py— the audio codec (compress waveform → tokens, decode tokens → waveform).
Attribution
The MusicGen MLX engine is based on musicgen-mlx by Andrade Olivier. The original AudioCraft library is by Meta AI Research.
AudioGen MLX port is original work in this repository.
License
MIT — see LICENSE.
The pre-trained model weights (facebook/audiogen-medium, facebook/musicgen-*) are released under the CC-BY-NC 4.0 licence by Meta.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlx_audiocraft-0.1.0.tar.gz.
File metadata
- Download URL: mlx_audiocraft-0.1.0.tar.gz
- Upload date:
- Size: 53.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc3e7a90b542661a66dc54f38832dc1f6dd07ce6dea42279ae6986cc76144451
|
|
| MD5 |
7285c68439e00f6cffe325d45a2abb53
|
|
| BLAKE2b-256 |
5303c80492e75a98a9c8dcc247b8bd5ca32b6ae7db2d0f5849f8b1287c885740
|
Provenance
The following attestation bundles were made for mlx_audiocraft-0.1.0.tar.gz:
Publisher:
publish.yml on theashishmaurya/mlx-audiocraft
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_audiocraft-0.1.0.tar.gz -
Subject digest:
dc3e7a90b542661a66dc54f38832dc1f6dd07ce6dea42279ae6986cc76144451 - Sigstore transparency entry: 1371028638
- Sigstore integration time:
-
Permalink:
theashishmaurya/mlx-audiocraft@70205ef0416706a4f27d0f1c2b10b4f7ea6520d8 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/theashishmaurya
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@70205ef0416706a4f27d0f1c2b10b4f7ea6520d8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mlx_audiocraft-0.1.0-py3-none-any.whl.
File metadata
- Download URL: mlx_audiocraft-0.1.0-py3-none-any.whl
- Upload date:
- Size: 65.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0851dcc63b707ba681a498359d599c8d855ee02e0ee7ebd2e2e2864c204b801
|
|
| MD5 |
f740fce5505adbf4d583cc58f0e98bfc
|
|
| BLAKE2b-256 |
0692d11b5f9d06d2209d50665ff3005810a29fb6cdd4d0e5767664c3ab3edf0c
|
Provenance
The following attestation bundles were made for mlx_audiocraft-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on theashishmaurya/mlx-audiocraft
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_audiocraft-0.1.0-py3-none-any.whl -
Subject digest:
e0851dcc63b707ba681a498359d599c8d855ee02e0ee7ebd2e2e2864c204b801 - Sigstore transparency entry: 1371028722
- Sigstore integration time:
-
Permalink:
theashishmaurya/mlx-audiocraft@70205ef0416706a4f27d0f1c2b10b4f7ea6520d8 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/theashishmaurya
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@70205ef0416706a4f27d0f1c2b10b4f7ea6520d8 -
Trigger Event:
release
-
Statement type: