Skip to main content

Declarative timeline-based multi-track audio mixing for voice, music, and SFX.

Project description

audio-arrange

Declarative timeline-based multi-track audio mixing for voice, music, and SFX.

Built from the MusicVideoCreator/CineForge pipeline at Trollfabriken AITrix AB where ffmpeg amix filter graphs became unmaintainable past 5 tracks. The library decodes once with soundfile, does all mixing as numpy operations on pre-aligned float32 arrays, and encodes once at the end. Renders a 5-minute, 6-track project in under 1.5 seconds.


What it solves

Previous problem Solution
Mixing 6 tracks with pydub takes 25+ seconds Single-pass numpy mix engine; same job in <1.5s
ffmpeg amix filter graphs unreadable past 5 tracks Declarative timeline.add(clip, track, at=...) API
pydub crossfades click on non-zero-crossing boundaries Equal-power crossfade with frame-aligned arithmetic
No clean way to do voice-over ducking in Python First-class timeline.duck(target, trigger)
LUFS normalization requires a separate ffmpeg-normalize pass timeline.normalize_lufs(-16) integrated into render
Clipping when summing many tracks Auto-headroom + optional tanh soft clip

Installation

pip install audio-arrange

Optional extras:

pip install "audio-arrange[mp3]"      # MP3 decode via pedalboard (no ffmpeg subprocess)
pip install "audio-arrange[lufs]"     # LUFS normalization via pyloudnorm
pip install "audio-arrange[duck]"     # Production-grade voice ducking via voice-duck
pip install "audio-arrange[all]"      # Everything above

Quick start

from audio_arrange import Timeline, Clip, RenderConfig

tl = Timeline(sample_rate=48000, channels=2)

# Load clips from disk
voice = Clip("narration.wav")
music = Clip("bed.flac", start_offset=4.0)   # skip 4s intro
sfx   = Clip("transition.wav")

# Place clips on named tracks
tl.add(voice, track="voice", at=0.0, fade_in=0.05, fade_out=0.1)
tl.add(music, track="music", at=0.0, gain_db=-6.0)
tl.add(sfx,   track="sfx",   at=12.5, gain_db=-3.0, fade_out=0.3)

# Duck music under voice — no separate pass needed
tl.duck(target="music", trigger="voice", reduction_db=-12.0)

# Normalize to podcast loudness target
tl.normalize_lufs(-16.0)

# Render to file — returns the path written
out = tl.render("episode_01.wav", bit_depth=16)
print(out)  # PosixPath('episode_01.wav')

The pipeline

  ┌─────────────────────────────────────────────────────────────────┐
  │  Clip loading                                                   │
  │  ① soundfile.read() → float32 (frames, channels)               │
  │  ② resample_poly if sample_rate != timeline.sample_rate         │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Timeline.add()                                                 │
  │  ③ Record Event(clip, track, at, gain_db, fade_in, fade_out)    │
  │     Nothing rendered yet — fully declarative                    │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  mixer.mix()  — single-pass on .render() call                   │
  │  ④ Allocate zero buffer at target length                        │
  │  ⑤ For each event: buffer[start:end] += gain * samples          │
  │  ⑥ Apply crossfades (opposing equal-power curves in-place)      │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Effects chain                                                  │
  │  ⑦ Duck envelope follower (RMS or voice-duck if installed)      │
  │  ⑧ LUFS normalization (pyloudnorm if installed)                 │
  │  ⑨ Auto-headroom → optional tanh soft clip → TPDF dither        │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Writer                                                         │
  │  ⑩ soundfile.write() → WAV / FLAC / OGG at chosen bit depth    │
  └─────────────────────────────────────────────────────────────────┘

Configuration

from audio_arrange import RenderConfig

config = RenderConfig(
    sample_rate=48000,    # output sample rate; clips resampled on load
    channels=2,           # 1 = mono, 2 = stereo
    headroom_db=-1.0,     # peak ceiling before final clip; prevents intersample clipping
    soft_clip=True,       # tanh curve instead of hard clip; preserves transient shape
    dither=True,          # TPDF dither for bit-depth reduction (16-bit outputs)
    progress=False,       # show tqdm bar during render (useful for long projects)
)

out = tl.render("episode.wav", bit_depth=16, config=config)
Field Type Default Description
sample_rate int 48000 Output sample rate in Hz
channels int 2 Channel count; mono clips are upmixed
headroom_db float -1.0 Peak limit applied before encode
soft_clip bool True tanh soft clip instead of hard truncation
dither bool True TPDF dither for 16/24-bit renders
progress bool False tqdm progress bar during render pass

Output format

timeline.render(path) writes a single audio file and returns the resolved Path. Format is inferred from the extension: .wav, .flac, .ogg are all supported natively via soundfile. For .mp3 output, install the [mp3] extra.

There is no JSON sidecar, no metadata file, and no intermediate temp file. The output is written in one soundfile.write() call after the entire mix buffer is assembled in memory.


Testing without files

All clips can be built from numpy arrays. No disk access required.

import numpy as np
from audio_arrange import Timeline, Clip

SR = 48000

# Synthesise 10 seconds of voice-like noise
voice_samples = np.random.randn(SR * 10, 2).astype(np.float32) * 0.3

# Synthesise a 440 Hz music bed
t = np.linspace(0, 10, SR * 10, endpoint=False)
music_samples = (np.sin(2 * np.pi * 440 * t)[:, None] * np.ones((1, 2))).astype(np.float32) * 0.2

voice_clip = Clip(voice_samples, sample_rate=SR)
music_clip = Clip(music_samples, sample_rate=SR)

tl = Timeline(sample_rate=SR, channels=2)
tl.add(voice_clip, track="voice", at=0.0)
tl.add(music_clip, track="music", at=0.0, gain_db=-6.0)
tl.duck(target="music", trigger="voice")

# Render to array — no file I/O at all
samples, sr = tl.render_to_array()
assert samples.shape == (SR * 10, 2)
assert sr == SR

CLI

# Minimum voice + music with auto-ducking and podcast loudness target
audio-arrange \
    --voice narration.wav \
    --music bed.mp3 \
    --duck \
    --target-lufs -16 \
    --output episode_01.wav

# Manifest-driven arrangement for complex multi-track projects
audio-arrange --manifest episode.toml --output episode_01.wav

# Manifest with loudness override at render time
audio-arrange --manifest episode.toml --target-lufs -14 --output youtube_cut.wav

# Inspect a manifest without rendering (dry run)
audio-arrange --manifest episode.toml --dry-run

# Render to FLAC at 24-bit
audio-arrange --manifest episode.toml --output episode_01.flac --bit-depth 24

Package structure

src/audio_arrange/
├── __init__.py             ← version + public re-exports (Timeline, Clip, RenderConfig)
├── timeline.py             ← Timeline class; orchestrates add/duck/crossfade/render
├── clip.py                 ← Clip class; lazy soundfile load, numpy mmap when possible
├── models.py               ← Pydantic v2: Track, Event, RenderConfig
├── config.py               ← RenderConfig alias (ergonomics import)
├── mixer.py                ← pure-numpy single-pass mix engine
├── manifest.py             ← TOML manifest parser → Timeline
├── cli.py                  ← argparse CLI entry point
├── utils.py                ← dB/linear conversion, frame helpers
├── effects/
│   ├── __init__.py         ← re-exports crossfade, duck, gain, pan
│   ├── gain.py             ← gain ramps and equal-power fade envelopes
│   ├── pan.py              ← equal-power stereo panning
│   ├── crossfade.py        ← equal-power and linear crossfade curves
│   └── duck.py             ← fallback RMS envelope-follower ducker
├── io/
│   ├── __init__.py         ← re-exports reader, writer, resample
│   ├── reader.py           ← soundfile-backed loader; resamples on read
│   ├── writer.py           ← soundfile-backed writer; dither + format selection
│   └── resample.py         ← scipy.signal.resample_poly wrapper
└── lufs/
    ├── __init__.py         ← re-exports normalize
    └── normalize.py        ← pyloudnorm delegation with clear ImportError guard

© Trollfabriken AITrix AB — MIT licensed

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_arrange-0.1.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audio_arrange-0.1.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file audio_arrange-0.1.0.tar.gz.

File metadata

  • Download URL: audio_arrange-0.1.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audio_arrange-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b8aa769e9f0d4b39f8eb7f194a120edab12b841a1f70c92e904310f59a8e7550
MD5 5a33ba0484411ad70b3e50658a35a571
BLAKE2b-256 b200c52850144383db86b14a8f33f7e095221bce426d207a2765c6a7d5a2167c

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_arrange-0.1.0.tar.gz:

Publisher: release.yml on tomastimelock/audio-arrange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file audio_arrange-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: audio_arrange-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audio_arrange-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2cc5ee50b5dc0ec3c467bdb949f780510ebf36e6380ee46ba2e95f1f419b9607
MD5 a3bf10b5f7762c5845bb72281a68f1dd
BLAKE2b-256 2abbea4621dcce3d347eb57fed1bb10f622ad23525d6b2e02df9129c5f2c6b3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_arrange-0.1.0-py3-none-any.whl:

Publisher: release.yml on tomastimelock/audio-arrange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page