Declarative timeline-based multi-track audio mixing for voice, music, and SFX.

These details have not been verified by PyPI

Project links

Project description

audio-arrange

Declarative timeline-based multi-track audio mixing for voice, music, and SFX.

Built from the MusicVideoCreator/CineForge pipeline at Trollfabriken AITrix AB where ffmpeg amix filter graphs became unmaintainable past 5 tracks. The library decodes once with soundfile, does all mixing as numpy operations on pre-aligned float32 arrays, and encodes once at the end. Renders a 5-minute, 6-track project in under 1.5 seconds.

What it solves

Previous problem	Solution
Mixing 6 tracks with pydub takes 25+ seconds	Single-pass numpy mix engine; same job in <1.5s
ffmpeg `amix` filter graphs unreadable past 5 tracks	Declarative `timeline.add(clip, track, at=...)` API
pydub crossfades click on non-zero-crossing boundaries	Equal-power crossfade with frame-aligned arithmetic
No clean way to do voice-over ducking in Python	First-class `timeline.duck(target, trigger)`
LUFS normalization requires a separate ffmpeg-normalize pass	`timeline.normalize_lufs(-16)` integrated into render
Clipping when summing many tracks	Auto-headroom + optional tanh soft clip

Installation

pip install audio-arrange

Optional extras:

pip install "audio-arrange[mp3]"      # MP3 decode via pedalboard (no ffmpeg subprocess)
pip install "audio-arrange[lufs]"     # LUFS normalization via pyloudnorm
pip install "audio-arrange[duck]"     # Production-grade voice ducking via voice-duck
pip install "audio-arrange[all]"      # Everything above

Quick start

from audio_arrange import Timeline, Clip, RenderConfig

tl = Timeline(sample_rate=48000, channels=2)

# Load clips from disk
voice = Clip("narration.wav")
music = Clip("bed.flac", start_offset=4.0)   # skip 4s intro
sfx   = Clip("transition.wav")

# Place clips on named tracks
tl.add(voice, track="voice", at=0.0, fade_in=0.05, fade_out=0.1)
tl.add(music, track="music", at=0.0, gain_db=-6.0)
tl.add(sfx,   track="sfx",   at=12.5, gain_db=-3.0, fade_out=0.3)

# Duck music under voice — no separate pass needed
tl.duck(target="music", trigger="voice", reduction_db=-12.0)

# Normalize to podcast loudness target
tl.normalize_lufs(-16.0)

# Render to file — returns the path written
out = tl.render("episode_01.wav", bit_depth=16)
print(out)  # PosixPath('episode_01.wav')

The pipeline

  ┌─────────────────────────────────────────────────────────────────┐
  │  Clip loading                                                   │
  │  ① soundfile.read() → float32 (frames, channels)               │
  │  ② resample_poly if sample_rate != timeline.sample_rate         │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Timeline.add()                                                 │
  │  ③ Record Event(clip, track, at, gain_db, fade_in, fade_out)    │
  │     Nothing rendered yet — fully declarative                    │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  mixer.mix()  — single-pass on .render() call                   │
  │  ④ Allocate zero buffer at target length                        │
  │  ⑤ For each event: buffer[start:end] += gain * samples          │
  │  ⑥ Apply crossfades (opposing equal-power curves in-place)      │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Effects chain                                                  │
  │  ⑦ Duck envelope follower (RMS or voice-duck if installed)      │
  │  ⑧ LUFS normalization (pyloudnorm if installed)                 │
  │  ⑨ Auto-headroom → optional tanh soft clip → TPDF dither        │
  └────────────────────────────┬────────────────────────────────────┘
                               │
  ┌────────────────────────────▼────────────────────────────────────┐
  │  Writer                                                         │
  │  ⑩ soundfile.write() → WAV / FLAC / OGG at chosen bit depth    │
  └─────────────────────────────────────────────────────────────────┘

Configuration

from audio_arrange import RenderConfig

config = RenderConfig(
    sample_rate=48000,    # output sample rate; clips resampled on load
    channels=2,           # 1 = mono, 2 = stereo
    headroom_db=-1.0,     # peak ceiling before final clip; prevents intersample clipping
    soft_clip=True,       # tanh curve instead of hard clip; preserves transient shape
    dither=True,          # TPDF dither for bit-depth reduction (16-bit outputs)
    progress=False,       # show tqdm bar during render (useful for long projects)
)

out = tl.render("episode.wav", bit_depth=16, config=config)

Field	Type	Default	Description
`sample_rate`	`int`	`48000`	Output sample rate in Hz
`channels`	`int`	`2`	Channel count; mono clips are upmixed
`headroom_db`	`float`	`-1.0`	Peak limit applied before encode
`soft_clip`	`bool`	`True`	tanh soft clip instead of hard truncation
`dither`	`bool`	`True`	TPDF dither for 16/24-bit renders
`progress`	`bool`	`False`	tqdm progress bar during render pass

Output format

timeline.render(path) writes a single audio file and returns the resolved Path. Format is inferred from the extension: .wav, .flac, .ogg are all supported natively via soundfile. For .mp3 output, install the [mp3] extra.

There is no JSON sidecar, no metadata file, and no intermediate temp file. The output is written in one soundfile.write() call after the entire mix buffer is assembled in memory.

Testing without files

All clips can be built from numpy arrays. No disk access required.

import numpy as np
from audio_arrange import Timeline, Clip

SR = 48000

# Synthesise 10 seconds of voice-like noise
voice_samples = np.random.randn(SR * 10, 2).astype(np.float32) * 0.3

# Synthesise a 440 Hz music bed
t = np.linspace(0, 10, SR * 10, endpoint=False)
music_samples = (np.sin(2 * np.pi * 440 * t)[:, None] * np.ones((1, 2))).astype(np.float32) * 0.2

voice_clip = Clip(voice_samples, sample_rate=SR)
music_clip = Clip(music_samples, sample_rate=SR)

tl = Timeline(sample_rate=SR, channels=2)
tl.add(voice_clip, track="voice", at=0.0)
tl.add(music_clip, track="music", at=0.0, gain_db=-6.0)
tl.duck(target="music", trigger="voice")

# Render to array — no file I/O at all
samples, sr = tl.render_to_array()
assert samples.shape == (SR * 10, 2)
assert sr == SR

CLI

# Minimum voice + music with auto-ducking and podcast loudness target
audio-arrange \
    --voice narration.wav \
    --music bed.mp3 \
    --duck \
    --target-lufs -16 \
    --output episode_01.wav

# Manifest-driven arrangement for complex multi-track projects
audio-arrange --manifest episode.toml --output episode_01.wav

# Manifest with loudness override at render time
audio-arrange --manifest episode.toml --target-lufs -14 --output youtube_cut.wav

# Inspect a manifest without rendering (dry run)
audio-arrange --manifest episode.toml --dry-run

# Render to FLAC at 24-bit
audio-arrange --manifest episode.toml --output episode_01.flac --bit-depth 24

Package structure

src/audio_arrange/
├── __init__.py             ← version + public re-exports (Timeline, Clip, RenderConfig)
├── timeline.py             ← Timeline class; orchestrates add/duck/crossfade/render
├── clip.py                 ← Clip class; lazy soundfile load, numpy mmap when possible
├── models.py               ← Pydantic v2: Track, Event, RenderConfig
├── config.py               ← RenderConfig alias (ergonomics import)
├── mixer.py                ← pure-numpy single-pass mix engine
├── manifest.py             ← TOML manifest parser → Timeline
├── cli.py                  ← argparse CLI entry point
├── utils.py                ← dB/linear conversion, frame helpers
├── effects/
│   ├── __init__.py         ← re-exports crossfade, duck, gain, pan
│   ├── gain.py             ← gain ramps and equal-power fade envelopes
│   ├── pan.py              ← equal-power stereo panning
│   ├── crossfade.py        ← equal-power and linear crossfade curves
│   └── duck.py             ← fallback RMS envelope-follower ducker
├── io/
│   ├── __init__.py         ← re-exports reader, writer, resample
│   ├── reader.py           ← soundfile-backed loader; resamples on read
│   ├── writer.py           ← soundfile-backed writer; dither + format selection
│   └── resample.py         ← scipy.signal.resample_poly wrapper
└── lufs/
    ├── __init__.py         ← re-exports normalize
    └── normalize.py        ← pyloudnorm delegation with clear ImportError guard

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_arrange-0.1.0.tar.gz (23.3 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audio_arrange-0.1.0-py3-none-any.whl (33.3 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file audio_arrange-0.1.0.tar.gz.

File metadata

Download URL: audio_arrange-0.1.0.tar.gz
Upload date: May 20, 2026
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audio_arrange-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b8aa769e9f0d4b39f8eb7f194a120edab12b841a1f70c92e904310f59a8e7550`
MD5	`5a33ba0484411ad70b3e50658a35a571`
BLAKE2b-256	`b200c52850144383db86b14a8f33f7e095221bce426d207a2765c6a7d5a2167c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_arrange-0.1.0.tar.gz:

Publisher: release.yml on tomastimelock/audio-arrange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audio_arrange-0.1.0.tar.gz
- Subject digest: b8aa769e9f0d4b39f8eb7f194a120edab12b841a1f70c92e904310f59a8e7550
- Sigstore transparency entry: 1578646702
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: tomastimelock/audio-arrange@732824eb09be9fddef35cd73bf55814f8e21da5e
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/tomastimelock
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@732824eb09be9fddef35cd73bf55814f8e21da5e
- Trigger Event: push

File details

Details for the file audio_arrange-0.1.0-py3-none-any.whl.

File metadata

Download URL: audio_arrange-0.1.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 33.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for audio_arrange-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cc5ee50b5dc0ec3c467bdb949f780510ebf36e6380ee46ba2e95f1f419b9607`
MD5	`a3bf10b5f7762c5845bb72281a68f1dd`
BLAKE2b-256	`2abbea4621dcce3d347eb57fed1bb10f622ad23525d6b2e02df9129c5f2c6b3a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for audio_arrange-0.1.0-py3-none-any.whl:

Publisher: release.yml on tomastimelock/audio-arrange

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: audio_arrange-0.1.0-py3-none-any.whl
- Subject digest: 2cc5ee50b5dc0ec3c467bdb949f780510ebf36e6380ee46ba2e95f1f419b9607
- Sigstore transparency entry: 1578647018
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: tomastimelock/audio-arrange@732824eb09be9fddef35cd73bf55814f8e21da5e
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/tomastimelock
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@732824eb09be9fddef35cd73bf55814f8e21da5e
- Trigger Event: push

audio-arrange 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

audio-arrange

What it solves

Installation

Quick start

The pipeline

Configuration

Output format

Testing without files

CLI

Package structure

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance