Skip to main content

Native audio I/O for MLX on macOS and Linux

Project description

mlx-audio-io

mlx-audio-io is the audio data layer for MLX: fast file decode/encode directly to and from mlx.core.array, with one API across macOS and Linux.

Why This Project Exists

MLX has strong tensor and model primitives, but it does not ship a first-class, cross-platform audio file I/O layer comparable to what torchaudio provides in the PyTorch ecosystem.

In practice, MLX users often end up with one of these compromises:

  • bridge through NumPy/SoundFile/librosa with extra copies and inconsistent format behavior
  • shell out to ffmpeg/ffprobe for non-WAV workflows
  • pull in parts of the PyTorch audio stack just to handle common audio containers/codecs

mlx-audio-io closes that gap with a native backend designed for MLX workloads:

  • direct decode/encode into mlx.core.array
  • one Python API (load, save, info, stream, batch_load) on both macOS and Linux
  • consistent validation and error messages across platforms
  • support for training/inference data access patterns (partial reads, chunked streaming, optional resampling)

Platform Backends

  • macOS backend optimized for Apple Silicon via AudioToolbox
  • Linux backend with native WAV/MP3 fast paths plus libav-backed codec support (FLAC/M4A/AIFF/CAF)

The public Python API is the same on both platforms: load, save, info, stream, batch_load.

Backend Feature Matrix

Capability macOS backend Linux backend
info(path) AudioToolbox-supported formats (WAV, MP3, M4A/AAC, FLAC, AIFF, CAF, etc.) WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
load(path) AudioToolbox-supported formats + native-rate MP3 fast path WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
load(..., sr=...) Supported, with AudioToolbox resampling Supported (WAV/MP3 native linear path, other supported formats via libav decode/resample)
save(path, ...) WAV, MP3, M4A/AAC, FLAC, AIFF, CAF WAV, MP3, M4A/AAC, FLAC, AIFF, CAF
encoding float32, pcm16, alac (for .m4a) float32, pcm16, alac (for .m4a)
stream(path, ...) AudioToolbox-supported formats + native-rate MP3 path WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
stream(..., sr=...) Supported Supported (WAV/MP3 native linear path, other supported formats via libav-backed chunked decode path)

Unsupported format/encoding combinations fail with explicit ValueError messages.

Installation

End users (PyPI)

For normal use:

pip install mlx-audio-io

Contributors (source checkout)

For local development and tests:

git clone https://github.com/ssmall256/mlx-audio-io.git
cd mlx-audio-io
uv sync --extra dev

Linux source build behavior

Linux source builds require libav and use direct libav-backed paths:

  • Linux info() for non-WAV formats uses direct libav metadata.
  • Linux load() for non-WAV formats uses direct libav decode for all offset/duration combinations.
  • Linux stream() for non-WAV formats uses direct libav packet/frame decode.
  • Linux save() for encoded formats (.mp3, .flac, .m4a, .aiff/.aif, .caf) uses direct libav encode/mux.

Requirements

  • Python 3.10+
  • Runtime:
    • macOS: Apple Silicon + mlx
    • Linux: mlx[cpu] (current default)
  • Source builds:
    • CMake 3.24+, C++17 toolchain, pkg-config
    • Linux default build: libavformat-dev, libavcodec-dev, libavutil-dev, libswresample-dev

Linux Troubleshooting

  • ModuleNotFoundError: mlx_audio_io
    • Install in the project environment (uv sync) and run via uv run ....
  • ImportError for mlx on Linux
    • Ensure Linux dependency is installed as mlx[cpu].
  • Build failures on source installs
    • Verify build-essential, cmake, ninja-build, and pkg-config are installed.
  • Extended Linux format support errors (.mp3, .m4a, .flac, .aiff, .caf)
    • For default Linux builds, ensure runtime libav libraries are present (libavformat, libavcodec, libavutil, libswresample).
  • MP3 test fixture generation failures
    • Tests that generate MP3 fixtures require ffmpeg or lame available on PATH.

Quickstart

from mlx_audio_io import load, save, info, stream, batch_load

# Load
x, sr = load("speech.wav")

# Resample + mono
x16, sr16 = load("speech.wav", sr=16000, mono=True)

# Metadata without decoding
meta = info("speech.wav")

# Stream in chunks
for chunk, chunk_sr in stream("long.wav", chunk_duration=2.0):
    pass

# Save WAV
save("out.wav", x, sr)
save("out_pcm16.wav", x, sr, encoding="pcm16")

# Batch load
items = batch_load(["a.wav", "b.wav"], sr=16000, mono=True)

Additional save examples:

save("out.flac", x, sr)
save("out.mp3", x, sr, bitrate="192k")
save("out.m4a", x, sr, bitrate="256k")
save("out.m4a", x, sr, encoding="alac")

API Reference

load

load(path, sr=None, offset=0.0, duration=None, mono=False,
     layout="channels_last", dtype="float32", resample_quality="default")

Decode audio into an mlx.core.array. Returns (audio, sample_rate).

Parameter Default Description
path Path to audio file
sr None Target sample rate; None keeps native rate
offset 0.0 Start position in seconds
duration None Duration in seconds; None reads to end
mono False Mix down to mono
layout "channels_last" "channels_last" [frames, ch] or "channels_first" [ch, frames]
dtype "float32" "float32" or "float16"
resample_quality "default" "default", "fastest", "low", "medium", "high", "best"

On Linux WAV/MP3 fast paths, resample quality levels currently map to the same linear behavior.

batch_load

batch_load(paths, sr=None, mono=False, dtype="float32", num_workers=4)

Threaded multi-file load(). Returns list[(audio, sample_rate)].

save

save(path, audio, sr, layout="channels_last", encoding="float32",
     bitrate="auto", clip=True)

Write audio from mx.array (or numpy.ndarray) to disk.

Parameter Default Description
path Output file path (format inferred from extension)
audio Audio data; 1-D input is treated as mono
sr Sample rate
layout "channels_last" Layout of the input array
encoding "float32" "float32", "pcm16", or "alac" (for .m4a)
bitrate "auto" Bitrate for lossy formats (.m4a AAC, .mp3 on Linux)
clip True Clamp samples to [-1, 1] before encoding

stream

stream(path, chunk_frames=None, chunk_duration=None, sr=None,
       mono=False, dtype="float32")

Return an iterator yielding (audio_chunk, sample_rate). Exactly one of chunk_frames or chunk_duration is required.

Parameter Default Description
path Path to audio file
chunk_frames None Chunk size in frames
chunk_duration None Chunk size in seconds
sr None Target sample rate; None keeps native rate
mono False Mix down to mono
dtype "float32" "float32" or "float16"

info

info(path)

Return AudioInfo metadata without decoding sample buffers.

Field Description
frames Total number of sample frames
sample_rate Sample rate in Hz
channels Number of channels
duration Duration in seconds
subtype Sample encoding (e.g. pcm16, float32)
container File format (e.g. wav, mp3, m4a)

Testing

Run all tests:

uv sync --extra dev
uv run python -m pytest -q

Run Linux supported subset:

uv run python -m pytest -q -m "not apple_only"

Run Apple-only subset:

uv run python -m pytest -q -m "apple_only"

Linux Docker run from a macOS host:

docker run --rm -it --platform linux/arm64 \
  -v "$PWD":/work -w /work \
  python:3.14-bookworm bash -lc '
    apt-get update && apt-get install -y --no-install-recommends \
      build-essential cmake ninja-build pkg-config ffmpeg \
      libavformat-dev libavcodec-dev libavutil-dev libswresample-dev &&
    python -m pip install -U pip uv &&
    uv sync --extra dev &&
    uv run python -m pytest -q -m "not apple_only"
  '

Performance

Benchmark methodology, commands, and full result tables live in docs/benchmarking.md.

Headline numbers (194.8s stereo PCM16 WAV @ 44.1 kHz, p50 median latency):

Task macOS M4 Max Linux arm64
Full WAV load 3.59 ms — 6.9x faster than librosa 8.41 ms — 5.9x faster than librosa
WAV partial read (1 s) 0.04 ms — 3.4x faster than librosa 0.05 ms — 2.6x faster than librosa
WAV save (float32) 6.98 ms — 2.8x faster than soundfile 31.70 ms — 1.8x faster than soundfile
MP3 load (native SR) 63.70 ms — 1.3x faster than librosa 80.93 ms — on par with librosa
M4A/AAC load 56.31 ms — 2.2x faster than librosa 89.63 ms — 1.6x faster than librosa
Load + resample 16 kHz 13.12 ms — 4.4x faster than librosa 10.93 ms — 7.9x faster than librosa

Full tables with torchaudio comparisons, M1 Max, and Linux x86_64 results are in the benchmarking doc.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_audio_io-1.2.0.tar.gz (289.8 kB view details)

Uploaded Source

File details

Details for the file mlx_audio_io-1.2.0.tar.gz.

File metadata

  • Download URL: mlx_audio_io-1.2.0.tar.gz
  • Upload date:
  • Size: 289.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_audio_io-1.2.0.tar.gz
Algorithm Hash digest
SHA256 969a0c6e198bc688c6d9497c6ffbb4785174076c44718b45e9528492cc21140f
MD5 81e8762af7b37f9f4ddb0a1538b43fd3
BLAKE2b-256 bc5f110b505dd5b3475a908055ab68673a2c87e69ba56b98e70eff441b6f0f9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_audio_io-1.2.0.tar.gz:

Publisher: release-pypi.yml on ssmall256/mlx-audio-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page