Native audio I/O for MLX on macOS and Linux
Project description
mlx-audio-io
mlx-audio-io is the audio data layer for MLX: fast file decode/encode directly to and from mlx.core.array, with one API across macOS and Linux.
Why This Project Exists
MLX has strong tensor and model primitives, but it does not ship a first-class, cross-platform audio file I/O layer comparable to what torchaudio provides in the PyTorch ecosystem.
In practice, MLX users often end up with one of these compromises:
- bridge through NumPy/SoundFile/librosa with extra copies and inconsistent format behavior
- shell out to
ffmpeg/ffprobefor non-WAV workflows - pull in parts of the PyTorch audio stack just to handle common audio containers/codecs
mlx-audio-io closes that gap with a native backend designed for MLX workloads:
- direct decode/encode into
mlx.core.array - one Python API (
load,save,info,stream,batch_load) on both macOS and Linux - consistent validation and error messages across platforms
- support for training/inference data access patterns (partial reads, chunked streaming, optional resampling)
Platform Backends
- macOS backend optimized for Apple Silicon via AudioToolbox
- Linux backend with native WAV/MP3 fast paths plus libav-backed codec support (FLAC/M4A/AIFF/CAF)
The public Python API is the same on both platforms: load, save, info, stream, batch_load.
Backend Feature Matrix
| Capability | macOS backend | Linux backend |
|---|---|---|
info(path) |
AudioToolbox-supported formats (WAV, MP3, M4A/AAC, FLAC, AIFF, CAF, etc.) | WAV, MP3, FLAC, M4A/AAC, AIFF, CAF |
load(path) |
AudioToolbox-supported formats + native-rate MP3 fast path | WAV, MP3, FLAC, M4A/AAC, AIFF, CAF |
load(..., sr=...) |
Supported, with AudioToolbox resampling | Supported (WAV/MP3 native linear path, other supported formats via libav decode/resample) |
save(path, ...) |
WAV, MP3, M4A/AAC, FLAC, AIFF, CAF | WAV, MP3, M4A/AAC, FLAC, AIFF, CAF |
encoding |
float32, pcm16, alac (for .m4a) |
float32, pcm16, alac (for .m4a) |
stream(path, ...) |
AudioToolbox-supported formats + native-rate MP3 path | WAV, MP3, FLAC, M4A/AAC, AIFF, CAF |
stream(..., sr=...) |
Supported | Supported (WAV/MP3 native linear path, other supported formats via libav-backed chunked decode path) |
Unsupported format/encoding combinations fail with explicit ValueError messages.
Installation
End users (PyPI)
For normal use:
pip install mlx-audio-io
Contributors (source checkout)
For local development and tests:
git clone https://github.com/ssmall256/mlx-audio-io.git
cd mlx-audio-io
uv sync --extra dev
Hard Rule: Do Not Copy .venv Between Machines
Do not copy project virtual environments across machines. Native extensions can fail integrity/code-sign checks or crash when moved between hosts.
If you already copied one, recreate it:
rm -rf .venv && uv venv --python 3.11 && uv sync
Linux source build behavior
Linux source builds require libav and use direct libav-backed paths:
- Linux
info()for non-WAV formats uses direct libav metadata. - Linux
load()for non-WAV formats uses direct libav decode for alloffset/durationcombinations. - Linux
stream()for non-WAV formats uses direct libav packet/frame decode. - Linux
save()for encoded formats (.mp3,.flac,.m4a,.aiff/.aif,.caf) uses direct libav encode/mux.
Requirements
- Python 3.10+
- Runtime:
- macOS: Apple Silicon +
mlx - Linux:
mlx[cpu](current default)
- macOS: Apple Silicon +
- Source builds:
- CMake 3.24+, C++17 toolchain,
pkg-config - Linux default build:
libavformat-dev,libavcodec-dev,libavutil-dev,libswresample-dev
- CMake 3.24+, C++17 toolchain,
Linux Troubleshooting
ModuleNotFoundError: mlx_audio_io- Install in the project environment (
uv sync) and run viauv run ....
- Install in the project environment (
ImportErrorformlxon Linux- Ensure Linux dependency is installed as
mlx[cpu].
- Ensure Linux dependency is installed as
- Build failures on source installs
- Verify
build-essential,cmake,ninja-build, andpkg-configare installed.
- Verify
- Extended Linux format support errors (
.mp3,.m4a,.flac,.aiff,.caf)- For default Linux builds, ensure runtime libav libraries are present (
libavformat,libavcodec,libavutil,libswresample).
- For default Linux builds, ensure runtime libav libraries are present (
- MP3 test fixture generation failures
- Tests that generate MP3 fixtures require
ffmpegorlameavailable onPATH.
- Tests that generate MP3 fixtures require
- Native import failures or unexpected crashes
- Run diagnostics:
python -m mlx_audio_io.doctor - Check MLX runtime compatibility:
python -c "import mlx_audio_io as aio; print(aio.show_build_info())" - If
build_mlx_versionandruntime_mlx_versiondiffer, reinstall with matching deps:pip install -U "mlx==<build_mlx_version>" "mlx-audio-io" - Avoid
pip install --no-depsformlx-audio-iounless you manually pin a matchingmlxversion. - Recreate env (do not copy
.venvbetween machines):rm -rf .venv && uv venv --python 3.11 && uv sync
- Run diagnostics:
Quickstart
from mlx_audio_io import load, save, info, stream, batch_load
# Load
x, sr = load("speech.wav")
# Resample + mono
x16, sr16 = load("speech.wav", sr=16000, mono=True)
# Metadata without decoding
meta = info("speech.wav")
# Stream in chunks
for chunk, chunk_sr in stream("long.wav", chunk_duration=2.0):
pass
# Save WAV
save("out.wav", x, sr)
save("out_pcm16.wav", x, sr, encoding="pcm16")
# Batch load
items = batch_load(["a.wav", "b.wav"], sr=16000, mono=True)
Additional save examples:
save("out.flac", x, sr)
save("out.mp3", x, sr, bitrate="192k")
save("out.m4a", x, sr, bitrate="256k")
save("out.m4a", x, sr, encoding="alac")
API Reference
load
load(path, sr=None, offset=0.0, duration=None, mono=False,
layout="channels_last", dtype="float32", resample_quality="default")
Decode audio into an mlx.core.array. Returns (audio, sample_rate).
| Parameter | Default | Description |
|---|---|---|
path |
— | Path to audio file |
sr |
None |
Target sample rate; None keeps native rate |
offset |
0.0 |
Start position in seconds |
duration |
None |
Duration in seconds; None reads to end |
mono |
False |
Mix down to mono |
layout |
"channels_last" |
"channels_last" [frames, ch] or "channels_first" [ch, frames] |
dtype |
"float32" |
"float32" or "float16" |
resample_quality |
"default" |
"default", "fastest", "low", "medium", "high", "best" |
On Linux WAV/MP3 fast paths, resample quality levels currently map to the same linear behavior.
batch_load
batch_load(paths, sr=None, mono=False, dtype="float32", num_workers=4)
Threaded multi-file load(). Returns list[(audio, sample_rate)].
save
save(path, audio, sr, layout="channels_last", encoding="float32",
bitrate="auto", clip=True)
Write audio from mx.array (or numpy.ndarray) to disk.
| Parameter | Default | Description |
|---|---|---|
path |
— | Output file path (format inferred from extension) |
audio |
— | Audio data; 1-D input is treated as mono |
sr |
— | Sample rate |
layout |
"channels_last" |
Layout of the input array |
encoding |
"float32" |
"float32", "pcm16", or "alac" (for .m4a) |
bitrate |
"auto" |
Bitrate for lossy formats (.m4a AAC, .mp3 on Linux) |
clip |
True |
Clamp samples to [-1, 1] before encoding |
stream
stream(path, chunk_frames=None, chunk_duration=None, sr=None,
mono=False, dtype="float32")
Return an iterator yielding (audio_chunk, sample_rate). Exactly one of chunk_frames or chunk_duration is required.
| Parameter | Default | Description |
|---|---|---|
path |
— | Path to audio file |
chunk_frames |
None |
Chunk size in frames |
chunk_duration |
None |
Chunk size in seconds |
sr |
None |
Target sample rate; None keeps native rate |
mono |
False |
Mix down to mono |
dtype |
"float32" |
"float32" or "float16" |
info
info(path)
Return AudioInfo metadata without decoding sample buffers.
| Field | Description |
|---|---|
frames |
Total number of sample frames |
sample_rate |
Sample rate in Hz |
channels |
Number of channels |
duration |
Duration in seconds |
subtype |
Sample encoding (e.g. pcm16, float32) |
container |
File format (e.g. wav, mp3, m4a) |
Testing
Run all tests:
uv sync --extra dev
uv run python -m pytest -q
Run Linux supported subset:
uv run python -m pytest -q -m "not apple_only"
Run Apple-only subset:
uv run python -m pytest -q -m "apple_only"
Linux Docker run from a macOS host:
docker run --rm -it --platform linux/arm64 \
-v "$PWD":/work -w /work \
python:3.14-bookworm bash -lc '
apt-get update && apt-get install -y --no-install-recommends \
build-essential cmake ninja-build pkg-config ffmpeg \
libavformat-dev libavcodec-dev libavutil-dev libswresample-dev &&
python -m pip install -U pip uv &&
uv sync --extra dev &&
uv run python -m pytest -q -m "not apple_only"
'
Performance
Benchmark methodology, commands, and full result tables live in docs/benchmarking.md.
Headline numbers (194.8s stereo PCM16 WAV @ 44.1 kHz, p50 median latency):
| Task | macOS M4 Max | Linux arm64 |
|---|---|---|
| Full WAV load | 3.59 ms — 6.9x faster than librosa | 8.41 ms — 5.9x faster than librosa |
| WAV partial read (1 s) | 0.04 ms — 3.4x faster than librosa | 0.05 ms — 2.6x faster than librosa |
| WAV save (float32) | 6.98 ms — 2.8x faster than soundfile | 31.70 ms — 1.8x faster than soundfile |
| MP3 load (native SR) | 63.70 ms — 1.3x faster than librosa | 80.93 ms — on par with librosa |
| M4A/AAC load | 56.31 ms — 2.2x faster than librosa | 89.63 ms — 1.6x faster than librosa |
| Load + resample 16 kHz | 13.12 ms — 4.4x faster than librosa | 10.93 ms — 7.9x faster than librosa |
Full tables with torchaudio comparisons, M1 Max, and Linux x86_64 results are in the benchmarking doc.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file mlx_audio_io-1.2.1.tar.gz.
File metadata
- Download URL: mlx_audio_io-1.2.1.tar.gz
- Upload date:
- Size: 298.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dfbb9dbcd460edb884d4d278f55a8ec00d8bae1949b1f10ce955c9dfb9798731
|
|
| MD5 |
1a534d4d2b66cd3f204b5a4c28c54827
|
|
| BLAKE2b-256 |
17f16b1f2340edf17aabd1a8be95fdf51c80de431fdd4f6644bce31850f569c9
|
Provenance
The following attestation bundles were made for mlx_audio_io-1.2.1.tar.gz:
Publisher:
release-pypi.yml on ssmall256/mlx-audio-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mlx_audio_io-1.2.1.tar.gz -
Subject digest:
dfbb9dbcd460edb884d4d278f55a8ec00d8bae1949b1f10ce955c9dfb9798731 - Sigstore transparency entry: 1006304132
- Sigstore integration time:
-
Permalink:
ssmall256/mlx-audio-io@c48427d30fe78fe202ad917d8a34f2883bb44a87 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/ssmall256
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release-pypi.yml@c48427d30fe78fe202ad917d8a34f2883bb44a87 -
Trigger Event:
workflow_dispatch
-
Statement type: