Skip to main content

Turn any recording into a 7.1 surround mix: a model designs a per-song spatial placement and renders a lossless 8-channel FLAC.

Project description

Natural Perspective Spatial Audio

Turn any recording into a 7.1 surround mix. After a song is split into its instrument stems, a model invents a scene and decides where each instrument sits around you — then a deterministic renderer builds a lossless 8-channel (7.1) FLAC for your media server.

Natural Perspective — a different soundstage designed per song

One real mix: every stem placed on the stage, the crowd behind you.

See it — no install

Open examples/index.html in any browser, on any OS. It's a finished mix's scene, soundstage, routing, and stem levels — the actual output of the tool.

Run it

Easiest — install from PyPI with pipx, no clone needed. Needs Python 3.10+ and a system FFmpeg:

brew install ffmpeg                   # macOS; Linux: sudo apt install ffmpeg
pipx install 'natural-perspective-spatial-audio[full]'
spatial-standards-gui                # or:  spatial-standards song.flac

Or from a clone — one command (macOS/Linux) sets up a private virtualenv and installs everything; on a Mac with Homebrew it installs Python for you too:

./install.sh
./gui                                # the GUI   (or: .venv/bin/spatial-standards song.flac)
Manual / Windows

Needs Python 3.10+ (3.12 recommended — widest wheel coverage):

python3 -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install '.[full]'                # quote it — the [..] is a shell glob otherwise
spatial-standards song.flac          # also: a folder, or a URL
spatial-standards-gui                # or the GUI

The GUI uses Tkinter: it ships with the python.org installer (recommended on macOS), with Homebrew add python-tk, on Debian/Ubuntu apt install python3-tk.

[full] brings everything as Python packages — FFmpeg + ffprobe (via static-ffmpeg), Demucs, the crowd model (audio-separator), and yt-dlp — so a fresh machine works after one install, no system setup. It's CPU by default; for an NVIDIA GPU install a CUDA build of PyTorch from pytorch.org and pip install 'audio-separator[gpu]' (much faster). The first run downloads model weights (a few hundred MB).

Prefer your own tools? pip install . has no required Python packages and just calls ffmpeg, demucs, audio-separator, and yt-dlp from your PATH.

The model layer needs an ANTHROPIC_API_KEY; without one the tool falls back to a built-in mix and runs fully offline.

Output drops straight into Plex/Jellyfin/Kodi:

<Artist>/Natural Perspective Spatial Audio/<Title> [...].flac   (+ per-album index.html)

How it works

Separate stems → measure each stem's level → a model invents a scene and emits a full mix configuration → a safety-clamped renderer builds the 7.1 FLAC. No audio is uploaded — only metadata, cover art, and the measured stem levels inform the design. Every track saves its config and an index.html documenting the scene, routing, and the exact model prompt/response.

Your responsibility, and credits

By processing a file or URL you affirm you have the right to it. This tool hosts no content and ships no audio. See NOTICE for that and full credit to the projects it builds on — FFmpeg, Demucs, audio-separator, yt-dlp, and the Mel-Band RoFormer crowd model.

License

Apache-2.0 — see LICENSE and NOTICE. Provided as is, without warranty of any kind.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

natural_perspective_spatial_audio-0.1.2.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file natural_perspective_spatial_audio-0.1.2.tar.gz.

File metadata

File hashes

Hashes for natural_perspective_spatial_audio-0.1.2.tar.gz
Algorithm Hash digest
SHA256 54d50dd02e692e45a104ebf592d37d102e2bc1be3a8b1d470a5fccb07b2d35e0
MD5 fc75d14577b34055009ba36bad4f2a42
BLAKE2b-256 af5bfea24f278c11fdf7d58287bdab8cbc1c219c06ebd0e8fd16837e891911f3

See more details on using hashes here.

Provenance

The following attestation bundles were made for natural_perspective_spatial_audio-0.1.2.tar.gz:

Publisher: publish.yml on mattluttrell/natural-perspective-spatial-audio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file natural_perspective_spatial_audio-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for natural_perspective_spatial_audio-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1b2ef9eb8bd8faeb1ed7031ddefed74597588c8a56dcd523f5a2227527c7cd7c
MD5 f320a034f65f24309bc3f6bb15c274f1
BLAKE2b-256 32cc3245e088e9b3ee12cf3a8d4a52bc5f6cd1eae8ef7712c6f8eac81a101dcf

See more details on using hashes here.

Provenance

The following attestation bundles were made for natural_perspective_spatial_audio-0.1.2-py3-none-any.whl:

Publisher: publish.yml on mattluttrell/natural-perspective-spatial-audio

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page