Declarative timeline-based multi-track audio mixing for voice, music, and SFX.
Project description
audio-arrange
Declarative timeline-based multi-track audio mixing for voice, music, and SFX.
Built from the MusicVideoCreator/CineForge pipeline at Trollfabriken AITrix AB where ffmpeg
amix filter graphs became unmaintainable past 5 tracks. The library decodes once with
soundfile, does all mixing as numpy operations on pre-aligned float32 arrays, and encodes
once at the end. Renders a 5-minute, 6-track project in under 1.5 seconds.
What it solves
| Previous problem | Solution |
|---|---|
| Mixing 6 tracks with pydub takes 25+ seconds | Single-pass numpy mix engine; same job in <1.5s |
ffmpeg amix filter graphs unreadable past 5 tracks |
Declarative timeline.add(clip, track, at=...) API |
| pydub crossfades click on non-zero-crossing boundaries | Equal-power crossfade with frame-aligned arithmetic |
| No clean way to do voice-over ducking in Python | First-class timeline.duck(target, trigger) |
| LUFS normalization requires a separate ffmpeg-normalize pass | timeline.normalize_lufs(-16) integrated into render |
| Clipping when summing many tracks | Auto-headroom + optional tanh soft clip |
Installation
pip install audio-arrange
Optional extras:
pip install "audio-arrange[mp3]" # MP3 decode via pedalboard (no ffmpeg subprocess)
pip install "audio-arrange[lufs]" # LUFS normalization via pyloudnorm
pip install "audio-arrange[duck]" # Production-grade voice ducking via voice-duck
pip install "audio-arrange[all]" # Everything above
Quick start
from audio_arrange import Timeline, Clip, RenderConfig
tl = Timeline(sample_rate=48000, channels=2)
# Load clips from disk
voice = Clip("narration.wav")
music = Clip("bed.flac", start_offset=4.0) # skip 4s intro
sfx = Clip("transition.wav")
# Place clips on named tracks
tl.add(voice, track="voice", at=0.0, fade_in=0.05, fade_out=0.1)
tl.add(music, track="music", at=0.0, gain_db=-6.0)
tl.add(sfx, track="sfx", at=12.5, gain_db=-3.0, fade_out=0.3)
# Duck music under voice — no separate pass needed
tl.duck(target="music", trigger="voice", reduction_db=-12.0)
# Normalize to podcast loudness target
tl.normalize_lufs(-16.0)
# Render to file — returns the path written
out = tl.render("episode_01.wav", bit_depth=16)
print(out) # PosixPath('episode_01.wav')
The pipeline
┌─────────────────────────────────────────────────────────────────┐
│ Clip loading │
│ ① soundfile.read() → float32 (frames, channels) │
│ ② resample_poly if sample_rate != timeline.sample_rate │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────────┐
│ Timeline.add() │
│ ③ Record Event(clip, track, at, gain_db, fade_in, fade_out) │
│ Nothing rendered yet — fully declarative │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────────┐
│ mixer.mix() — single-pass on .render() call │
│ ④ Allocate zero buffer at target length │
│ ⑤ For each event: buffer[start:end] += gain * samples │
│ ⑥ Apply crossfades (opposing equal-power curves in-place) │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────────┐
│ Effects chain │
│ ⑦ Duck envelope follower (RMS or voice-duck if installed) │
│ ⑧ LUFS normalization (pyloudnorm if installed) │
│ ⑨ Auto-headroom → optional tanh soft clip → TPDF dither │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────────────────────────▼────────────────────────────────────┐
│ Writer │
│ ⑩ soundfile.write() → WAV / FLAC / OGG at chosen bit depth │
└─────────────────────────────────────────────────────────────────┘
Configuration
from audio_arrange import RenderConfig
config = RenderConfig(
sample_rate=48000, # output sample rate; clips resampled on load
channels=2, # 1 = mono, 2 = stereo
headroom_db=-1.0, # peak ceiling before final clip; prevents intersample clipping
soft_clip=True, # tanh curve instead of hard clip; preserves transient shape
dither=True, # TPDF dither for bit-depth reduction (16-bit outputs)
progress=False, # show tqdm bar during render (useful for long projects)
)
out = tl.render("episode.wav", bit_depth=16, config=config)
| Field | Type | Default | Description |
|---|---|---|---|
sample_rate |
int |
48000 |
Output sample rate in Hz |
channels |
int |
2 |
Channel count; mono clips are upmixed |
headroom_db |
float |
-1.0 |
Peak limit applied before encode |
soft_clip |
bool |
True |
tanh soft clip instead of hard truncation |
dither |
bool |
True |
TPDF dither for 16/24-bit renders |
progress |
bool |
False |
tqdm progress bar during render pass |
Output format
timeline.render(path) writes a single audio file and returns the resolved Path. Format
is inferred from the extension: .wav, .flac, .ogg are all supported natively via
soundfile. For .mp3 output, install the [mp3] extra.
There is no JSON sidecar, no metadata file, and no intermediate temp file. The output is
written in one soundfile.write() call after the entire mix buffer is assembled in memory.
Testing without files
All clips can be built from numpy arrays. No disk access required.
import numpy as np
from audio_arrange import Timeline, Clip
SR = 48000
# Synthesise 10 seconds of voice-like noise
voice_samples = np.random.randn(SR * 10, 2).astype(np.float32) * 0.3
# Synthesise a 440 Hz music bed
t = np.linspace(0, 10, SR * 10, endpoint=False)
music_samples = (np.sin(2 * np.pi * 440 * t)[:, None] * np.ones((1, 2))).astype(np.float32) * 0.2
voice_clip = Clip(voice_samples, sample_rate=SR)
music_clip = Clip(music_samples, sample_rate=SR)
tl = Timeline(sample_rate=SR, channels=2)
tl.add(voice_clip, track="voice", at=0.0)
tl.add(music_clip, track="music", at=0.0, gain_db=-6.0)
tl.duck(target="music", trigger="voice")
# Render to array — no file I/O at all
samples, sr = tl.render_to_array()
assert samples.shape == (SR * 10, 2)
assert sr == SR
CLI
# Minimum voice + music with auto-ducking and podcast loudness target
audio-arrange \
--voice narration.wav \
--music bed.mp3 \
--duck \
--target-lufs -16 \
--output episode_01.wav
# Manifest-driven arrangement for complex multi-track projects
audio-arrange --manifest episode.toml --output episode_01.wav
# Manifest with loudness override at render time
audio-arrange --manifest episode.toml --target-lufs -14 --output youtube_cut.wav
# Inspect a manifest without rendering (dry run)
audio-arrange --manifest episode.toml --dry-run
# Render to FLAC at 24-bit
audio-arrange --manifest episode.toml --output episode_01.flac --bit-depth 24
Package structure
src/audio_arrange/
├── __init__.py ← version + public re-exports (Timeline, Clip, RenderConfig)
├── timeline.py ← Timeline class; orchestrates add/duck/crossfade/render
├── clip.py ← Clip class; lazy soundfile load, numpy mmap when possible
├── models.py ← Pydantic v2: Track, Event, RenderConfig
├── config.py ← RenderConfig alias (ergonomics import)
├── mixer.py ← pure-numpy single-pass mix engine
├── manifest.py ← TOML manifest parser → Timeline
├── cli.py ← argparse CLI entry point
├── utils.py ← dB/linear conversion, frame helpers
├── effects/
│ ├── __init__.py ← re-exports crossfade, duck, gain, pan
│ ├── gain.py ← gain ramps and equal-power fade envelopes
│ ├── pan.py ← equal-power stereo panning
│ ├── crossfade.py ← equal-power and linear crossfade curves
│ └── duck.py ← fallback RMS envelope-follower ducker
├── io/
│ ├── __init__.py ← re-exports reader, writer, resample
│ ├── reader.py ← soundfile-backed loader; resamples on read
│ ├── writer.py ← soundfile-backed writer; dither + format selection
│ └── resample.py ← scipy.signal.resample_poly wrapper
└── lufs/
├── __init__.py ← re-exports normalize
└── normalize.py ← pyloudnorm delegation with clear ImportError guard
© Trollfabriken AITrix AB — MIT licensed
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audio_arrange-0.1.0.tar.gz.
File metadata
- Download URL: audio_arrange-0.1.0.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8aa769e9f0d4b39f8eb7f194a120edab12b841a1f70c92e904310f59a8e7550
|
|
| MD5 |
5a33ba0484411ad70b3e50658a35a571
|
|
| BLAKE2b-256 |
b200c52850144383db86b14a8f33f7e095221bce426d207a2765c6a7d5a2167c
|
Provenance
The following attestation bundles were made for audio_arrange-0.1.0.tar.gz:
Publisher:
release.yml on tomastimelock/audio-arrange
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
audio_arrange-0.1.0.tar.gz -
Subject digest:
b8aa769e9f0d4b39f8eb7f194a120edab12b841a1f70c92e904310f59a8e7550 - Sigstore transparency entry: 1578646702
- Sigstore integration time:
-
Permalink:
tomastimelock/audio-arrange@732824eb09be9fddef35cd73bf55814f8e21da5e -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tomastimelock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@732824eb09be9fddef35cd73bf55814f8e21da5e -
Trigger Event:
push
-
Statement type:
File details
Details for the file audio_arrange-0.1.0-py3-none-any.whl.
File metadata
- Download URL: audio_arrange-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cc5ee50b5dc0ec3c467bdb949f780510ebf36e6380ee46ba2e95f1f419b9607
|
|
| MD5 |
a3bf10b5f7762c5845bb72281a68f1dd
|
|
| BLAKE2b-256 |
2abbea4621dcce3d347eb57fed1bb10f622ad23525d6b2e02df9129c5f2c6b3a
|
Provenance
The following attestation bundles were made for audio_arrange-0.1.0-py3-none-any.whl:
Publisher:
release.yml on tomastimelock/audio-arrange
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
audio_arrange-0.1.0-py3-none-any.whl -
Subject digest:
2cc5ee50b5dc0ec3c467bdb949f780510ebf36e6380ee46ba2e95f1f419b9607 - Sigstore transparency entry: 1578647018
- Sigstore integration time:
-
Permalink:
tomastimelock/audio-arrange@732824eb09be9fddef35cd73bf55814f8e21da5e -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/tomastimelock
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@732824eb09be9fddef35cd73bf55814f8e21da5e -
Trigger Event:
push
-
Statement type: