Turn any recording into a 7.1 surround mix: a model designs a per-song spatial placement and renders a lossless 8-channel FLAC.
Project description
Natural Perspective Spatial Audio
Turn any recording into a 7.1 surround mix. After a song is split into its instrument stems, a model invents a scene and decides where each instrument sits around you — then a deterministic renderer builds a lossless 8-channel (7.1) FLAC for your media server.
One real mix: every stem placed on the stage, the crowd behind you.
See it — no install
Open examples/index.html in any browser, on any OS.
It's a finished mix's scene, soundstage, routing, and stem levels — the actual
output of the tool.
Run it
Easiest — install from PyPI with pipx, no clone needed. Needs Python 3.10+ and a system FFmpeg:
brew install ffmpeg # macOS; Linux: sudo apt install ffmpeg
pipx install 'natural-perspective-spatial-audio[full]'
spatial-standards-gui # or: spatial-standards song.flac
Or from a clone — one command (macOS/Linux) sets up a private virtualenv and installs everything; on a Mac with Homebrew it installs Python for you too:
./install.sh
./gui # the GUI (or: .venv/bin/spatial-standards song.flac)
Manual / Windows
Needs Python 3.10+ (3.12 recommended — widest wheel coverage):
python3 -m venv .venv && source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install '.[full]' # quote it — the [..] is a shell glob otherwise
spatial-standards song.flac # also: a folder, or a URL
spatial-standards-gui # or the GUI
The GUI uses Tkinter: it ships with the python.org installer (recommended on
macOS), with Homebrew add python-tk, on Debian/Ubuntu apt install python3-tk.
[full] brings everything as Python packages — FFmpeg + ffprobe (via
static-ffmpeg), Demucs, the crowd model (audio-separator), and yt-dlp —
so a fresh machine works after one install, no system setup. It's CPU by
default; for an NVIDIA GPU install a CUDA build of PyTorch from pytorch.org
and pip install 'audio-separator[gpu]' (much faster). The first run
downloads model weights (a few hundred MB).
Prefer your own tools? pip install . has no required Python packages and
just calls ffmpeg, demucs, audio-separator, and yt-dlp from your PATH.
The model layer needs an ANTHROPIC_API_KEY; without one the tool falls back to
a built-in mix and runs fully offline.
Output drops straight into Plex/Jellyfin/Kodi:
<Artist>/Natural Perspective Spatial Audio/<Title> [...].flac (+ per-album index.html)
How it works
Separate stems → measure each stem's level → a model invents a scene and emits a
full mix configuration → a safety-clamped renderer builds the
7.1 FLAC. No audio is uploaded — only metadata, cover art, and the measured
stem levels inform the design. Every track saves its config and an index.html
documenting the scene, routing, and the exact model prompt/response.
Your responsibility, and credits
By processing a file or URL you affirm you have the right to it. This tool hosts
no content and ships no audio. See NOTICE for that and full credit to
the projects it builds on — FFmpeg, Demucs, audio-separator, yt-dlp, and the
Mel-Band RoFormer crowd model.
License
Apache-2.0 — see LICENSE and NOTICE. Provided as is,
without warranty of any kind.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file natural_perspective_spatial_audio-0.1.2.tar.gz.
File metadata
- Download URL: natural_perspective_spatial_audio-0.1.2.tar.gz
- Upload date:
- Size: 54.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54d50dd02e692e45a104ebf592d37d102e2bc1be3a8b1d470a5fccb07b2d35e0
|
|
| MD5 |
fc75d14577b34055009ba36bad4f2a42
|
|
| BLAKE2b-256 |
af5bfea24f278c11fdf7d58287bdab8cbc1c219c06ebd0e8fd16837e891911f3
|
Provenance
The following attestation bundles were made for natural_perspective_spatial_audio-0.1.2.tar.gz:
Publisher:
publish.yml on mattluttrell/natural-perspective-spatial-audio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
natural_perspective_spatial_audio-0.1.2.tar.gz -
Subject digest:
54d50dd02e692e45a104ebf592d37d102e2bc1be3a8b1d470a5fccb07b2d35e0 - Sigstore transparency entry: 1944531835
- Sigstore integration time:
-
Permalink:
mattluttrell/natural-perspective-spatial-audio@937175db88441b3918b149e23182fc73b95903f8 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/mattluttrell
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@937175db88441b3918b149e23182fc73b95903f8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file natural_perspective_spatial_audio-0.1.2-py3-none-any.whl.
File metadata
- Download URL: natural_perspective_spatial_audio-0.1.2-py3-none-any.whl
- Upload date:
- Size: 61.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1b2ef9eb8bd8faeb1ed7031ddefed74597588c8a56dcd523f5a2227527c7cd7c
|
|
| MD5 |
f320a034f65f24309bc3f6bb15c274f1
|
|
| BLAKE2b-256 |
32cc3245e088e9b3ee12cf3a8d4a52bc5f6cd1eae8ef7712c6f8eac81a101dcf
|
Provenance
The following attestation bundles were made for natural_perspective_spatial_audio-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on mattluttrell/natural-perspective-spatial-audio
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
natural_perspective_spatial_audio-0.1.2-py3-none-any.whl -
Subject digest:
1b2ef9eb8bd8faeb1ed7031ddefed74597588c8a56dcd523f5a2227527c7cd7c - Sigstore transparency entry: 1944531941
- Sigstore integration time:
-
Permalink:
mattluttrell/natural-perspective-spatial-audio@937175db88441b3918b149e23182fc73b95903f8 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/mattluttrell
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@937175db88441b3918b149e23182fc73b95903f8 -
Trigger Event:
push
-
Statement type: