Native audio I/O for MLX on macOS and Linux

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

mlx-audio-io

mlx-audio-io is the audio data layer for MLX: fast file decode/encode directly to and from mlx.core.array, with one API across macOS and Linux.

Why This Project Exists

MLX has strong tensor and model primitives, but it does not ship a first-class, cross-platform audio file I/O layer comparable to what torchaudio provides in the PyTorch ecosystem.

In practice, MLX users often end up with one of these compromises:

bridge through NumPy/SoundFile/librosa with extra copies and inconsistent format behavior
shell out to ffmpeg/ffprobe for non-WAV workflows
pull in parts of the PyTorch audio stack just to handle common audio containers/codecs

mlx-audio-io closes that gap with a native backend designed for MLX workloads:

direct decode/encode into mlx.core.array
one Python API (load, save, info, stream, batch_load) on both macOS and Linux
consistent validation and error messages across platforms
support for training/inference data access patterns (partial reads, chunked streaming, optional resampling)

Platform Backends

macOS backend optimized for Apple Silicon via AudioToolbox
Linux backend with native WAV/MP3 fast paths plus libav-backed codec support (FLAC/M4A/AIFF/CAF)

The public Python API is the same on both platforms: load, save, info, stream, batch_load.

Backend Feature Matrix

Capability	macOS backend	Linux backend
`info(path)`	AudioToolbox-supported formats (WAV, MP3, M4A/AAC, FLAC, AIFF, CAF, etc.)	WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
`load(path)`	AudioToolbox-supported formats + native-rate MP3 fast path	WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
`load(..., sr=...)`	Supported, with AudioToolbox resampling	Supported (`WAV/MP3` native linear path, other supported formats via libav decode/resample)
`save(path, ...)`	WAV, MP3, M4A/AAC, FLAC, AIFF, CAF	WAV, MP3, M4A/AAC, FLAC, AIFF, CAF
`encoding`	`float32`, `pcm16`, `alac` (for `.m4a`)	`float32`, `pcm16`, `alac` (for `.m4a`)
`stream(path, ...)`	AudioToolbox-supported formats + native-rate MP3 path	WAV, MP3, FLAC, M4A/AAC, AIFF, CAF
`stream(..., sr=...)`	Supported	Supported (`WAV/MP3` native linear path, other supported formats via libav-backed chunked decode path)

Unsupported format/encoding combinations fail with explicit ValueError messages.

Installation

End users (PyPI)

For normal use:

pip install mlx-audio-io

Contributors (source checkout)

For local development and tests:

git clone https://github.com/ssmall256/mlx-audio-io.git
cd mlx-audio-io
uv sync --extra dev

Linux source build behavior

Linux source builds require libav and use direct libav-backed paths:

Linux info() for non-WAV formats uses direct libav metadata.
Linux load() for non-WAV formats uses direct libav decode for all offset/duration combinations.
Linux stream() for non-WAV formats uses direct libav packet/frame decode.
Linux save() for encoded formats (.mp3, .flac, .m4a, .aiff/.aif, .caf) uses direct libav encode/mux.

Requirements

Python 3.10+
Runtime:
- macOS: Apple Silicon + mlx
- Linux: mlx[cpu] (current default)
Source builds:
- CMake 3.24+, C++17 toolchain, pkg-config
- Linux default build: libavformat-dev, libavcodec-dev, libavutil-dev, libswresample-dev

Linux Troubleshooting

ModuleNotFoundError: mlx_audio_io
- Install in the project environment (uv sync) and run via uv run ....
ImportError for mlx on Linux
- Ensure Linux dependency is installed as mlx[cpu].
Build failures on source installs
- Verify build-essential, cmake, ninja-build, and pkg-config are installed.
Extended Linux format support errors (.mp3, .m4a, .flac, .aiff, .caf)
- For default Linux builds, ensure runtime libav libraries are present (libavformat, libavcodec, libavutil, libswresample).
MP3 test fixture generation failures
- Tests that generate MP3 fixtures require ffmpeg or lame available on PATH.

Quickstart

from mlx_audio_io import load, save, info, stream, batch_load

# Load
x, sr = load("speech.wav")

# Resample + mono
x16, sr16 = load("speech.wav", sr=16000, mono=True)

# Metadata without decoding
meta = info("speech.wav")

# Stream in chunks
for chunk, chunk_sr in stream("long.wav", chunk_duration=2.0):
    pass

# Save WAV
save("out.wav", x, sr)
save("out_pcm16.wav", x, sr, encoding="pcm16")

# Batch load
items = batch_load(["a.wav", "b.wav"], sr=16000, mono=True)

Additional save examples:

save("out.flac", x, sr)
save("out.mp3", x, sr, bitrate="192k")
save("out.m4a", x, sr, bitrate="256k")
save("out.m4a", x, sr, encoding="alac")

API Reference

`load`

load(path, sr=None, offset=0.0, duration=None, mono=False,
     layout="channels_last", dtype="float32", resample_quality="default")

Decode audio into an mlx.core.array. Returns (audio, sample_rate).

Parameter	Default	Description
`path`	—	Path to audio file
`sr`	`None`	Target sample rate; `None` keeps native rate
`offset`	`0.0`	Start position in seconds
`duration`	`None`	Duration in seconds; `None` reads to end
`mono`	`False`	Mix down to mono
`layout`	`"channels_last"`	`"channels_last"` `[frames, ch]` or `"channels_first"` `[ch, frames]`
`dtype`	`"float32"`	`"float32"` or `"float16"`
`resample_quality`	`"default"`	`"default"`, `"fastest"`, `"low"`, `"medium"`, `"high"`, `"best"`

On Linux WAV/MP3 fast paths, resample quality levels currently map to the same linear behavior.

`batch_load`

batch_load(paths, sr=None, mono=False, dtype="float32", num_workers=4)

Threaded multi-file load(). Returns list[(audio, sample_rate)].

`save`

save(path, audio, sr, layout="channels_last", encoding="float32",
     bitrate="auto", clip=True)

Write audio from mx.array (or numpy.ndarray) to disk.

Parameter	Default	Description
`path`	—	Output file path (format inferred from extension)
`audio`	—	Audio data; 1-D input is treated as mono
`sr`	—	Sample rate
`layout`	`"channels_last"`	Layout of the input array
`encoding`	`"float32"`	`"float32"`, `"pcm16"`, or `"alac"` (for `.m4a`)
`bitrate`	`"auto"`	Bitrate for lossy formats (`.m4a` AAC, `.mp3` on Linux)
`clip`	`True`	Clamp samples to `[-1, 1]` before encoding

`stream`

stream(path, chunk_frames=None, chunk_duration=None, sr=None,
       mono=False, dtype="float32")

Return an iterator yielding (audio_chunk, sample_rate). Exactly one of chunk_frames or chunk_duration is required.

Parameter	Default	Description
`path`	—	Path to audio file
`chunk_frames`	`None`	Chunk size in frames
`chunk_duration`	`None`	Chunk size in seconds
`sr`	`None`	Target sample rate; `None` keeps native rate
`mono`	`False`	Mix down to mono
`dtype`	`"float32"`	`"float32"` or `"float16"`

`info`

info(path)

Return AudioInfo metadata without decoding sample buffers.

Field	Description
`frames`	Total number of sample frames
`sample_rate`	Sample rate in Hz
`channels`	Number of channels
`duration`	Duration in seconds
`subtype`	Sample encoding (e.g. `pcm16`, `float32`)
`container`	File format (e.g. `wav`, `mp3`, `m4a`)

Testing

Run all tests:

uv sync --extra dev
uv run python -m pytest -q

Run Linux supported subset:

uv run python -m pytest -q -m "not apple_only"

Run Apple-only subset:

uv run python -m pytest -q -m "apple_only"

Linux Docker run from a macOS host:

docker run --rm -it --platform linux/arm64 \
  -v "$PWD":/work -w /work \
  python:3.14-bookworm bash -lc '
    apt-get update && apt-get install -y --no-install-recommends \
      build-essential cmake ninja-build pkg-config ffmpeg \
      libavformat-dev libavcodec-dev libavutil-dev libswresample-dev &&
    python -m pip install -U pip uv &&
    uv sync --extra dev &&
    uv run python -m pytest -q -m "not apple_only"
  '

Performance

Benchmark methodology, commands, and full result tables live in docs/benchmarking.md.

Headline numbers (194.8s stereo PCM16 WAV @ 44.1 kHz, p50 median latency):

Task	macOS M4 Max	Linux arm64
Full WAV load	3.59 ms — 6.9x faster than librosa	8.41 ms — 5.9x faster than librosa
WAV partial read (1 s)	0.04 ms — 3.4x faster than librosa	0.05 ms — 2.6x faster than librosa
WAV save (float32)	6.98 ms — 2.8x faster than soundfile	31.70 ms — 1.8x faster than soundfile
MP3 load (native SR)	63.70 ms — 1.3x faster than librosa	80.93 ms — on par with librosa
M4A/AAC load	56.31 ms — 2.2x faster than librosa	89.63 ms — 1.6x faster than librosa
Load + resample 16 kHz	13.12 ms — 4.4x faster than librosa	10.93 ms — 7.9x faster than librosa

Full tables with torchaudio comparisons, M1 Max, and Linux x86_64 results are in the benchmarking doc.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ssmall256

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.3.10

Mar 13, 2026

1.3.9

Mar 6, 2026

1.3.8

Mar 6, 2026

1.3.7

Mar 1, 2026

1.3.6

Mar 1, 2026

1.3.5

Mar 1, 2026

1.3.4

Mar 1, 2026

1.3.3

Mar 1, 2026

1.3.2

Mar 1, 2026

1.3.1

Feb 26, 2026

1.3.0

Feb 22, 2026

1.2.1

Mar 1, 2026

This version

1.2.0

Feb 22, 2026

1.1.0

Feb 21, 2026

1.0.1

Feb 17, 2026

1.0.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlx_audio_io-1.2.0.tar.gz (289.8 kB view details)

Uploaded Feb 22, 2026 Source

File details

Details for the file mlx_audio_io-1.2.0.tar.gz.

File metadata

Download URL: mlx_audio_io-1.2.0.tar.gz
Upload date: Feb 22, 2026
Size: 289.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mlx_audio_io-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`969a0c6e198bc688c6d9497c6ffbb4785174076c44718b45e9528492cc21140f`
MD5	`81e8762af7b37f9f4ddb0a1538b43fd3`
BLAKE2b-256	`bc5f110b505dd5b3475a908055ab68673a2c87e69ba56b98e70eff441b6f0f9b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlx_audio_io-1.2.0.tar.gz:

Publisher: release-pypi.yml on ssmall256/mlx-audio-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mlx_audio_io-1.2.0.tar.gz
- Subject digest: 969a0c6e198bc688c6d9497c6ffbb4785174076c44718b45e9528492cc21140f
- Sigstore transparency entry: 976407748
- Sigstore integration time: Feb 22, 2026
Source repository:
- Permalink: ssmall256/mlx-audio-io@ad761a1d1900585c58d8f72bed0777216301b11f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/ssmall256
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release-pypi.yml@ad761a1d1900585c58d8f72bed0777216301b11f
- Trigger Event: workflow_dispatch

mlx-audio-io 1.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

mlx-audio-io

Why This Project Exists

Platform Backends

Backend Feature Matrix

Installation

End users (PyPI)

Contributors (source checkout)

Linux source build behavior

Requirements

Linux Troubleshooting

Quickstart

API Reference

load

batch_load

save

stream

info

Testing

Performance

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Provenance

`load`

`batch_load`

`save`

`stream`

`info`