Skip to main content

Generate ambient music from text. Locally. No GPU required.

Project description

LatentScore

Try in Colab Listen to Demo

Generate ambient music from text. Locally. No GPU required.

import latentscore as ls

ls.render("warm sunset over water", model="fast_heavy").play()

That's it. One line. You get audio playing on your speakers.

⚠️ Alpha — under active development. API may change between versions. Read more about how it works.


Install

Requires Python 3.10–3.12. If you don't have it: brew install python@3.10 (macOS) or pyenv install 3.10.

pip install latentscore

Or with conda:

conda create -n latentscore python=3.10 -y
conda activate latentscore
pip install latentscore

CLI

latentscore doctor                       # check setup and model availability
latentscore demo                         # render and play a sample
latentscore demo --duration 30           # 30-second demo
latentscore demo --output ambient.wav    # save to file

Quick Start

Render and play

import latentscore as ls

audio = ls.render("warm sunset over water", model="fast_heavy", duration=10.0)
audio.play()              # plays on your speakers
audio.save("output.wav")  # save to WAV

Different vibes

ls.render("jazz cafe at midnight", model="fast_heavy").play()
ls.render("thunderstorm on a tin roof", model="fast_heavy").play()
ls.render("lo-fi study beats", model="fast_heavy").play()

Controlling the Sound

MusicConfig (full control)

Build a config directly with human-readable labels:

import latentscore as ls

config = ls.MusicConfig(
    tempo="slow",
    brightness="dark",
    space="vast",
    density=3,
    bass="drone",
    pad="ambient_drift",
    melody="contemplative",
    rhythm="minimal",
    texture="shimmer",
    echo="heavy",
    root="d",
    mode="minor",
)

ls.render(config, duration=10.0).play()

MusicConfigUpdate (tweak a vibe)

Start from a vibe and override specific parameters:

import latentscore as ls

audio = ls.render(
    "morning coffee shop",
    duration=10.0,
    update=ls.MusicConfigUpdate(
        brightness="very_bright",
        rhythm="electronic",
    ),
)
audio.play()

Relative steps

Step(+1) moves one level up the scale, Step(-1) moves one down. Saturates at boundaries.

from latentscore.config import Step

audio = ls.render(
    "morning coffee shop",
    duration=10.0,
    update=ls.MusicConfigUpdate(
        brightness=Step(+2),   # two levels brighter
        space=Step(-1),        # one level less spacious
    ),
)
audio.play()

Streaming

Chain vibes together with smooth crossfade transitions:

import latentscore as ls

stream = ls.stream(
    "morning coffee",
    "afternoon focus",
    "evening wind-down",
    duration=60,       # 60 seconds per vibe
    transition=5.0,    # 5-second crossfade
)
stream.play()

# Or collect and save
stream.collect().save("session.wav")

Live Streaming

For dynamic, interactive use (games, installations, adaptive UIs), use a generator to feed vibes and steer the music in real time:

import asyncio
from collections.abc import AsyncIterator

import latentscore as ls
from latentscore.config import Step


async def my_set() -> AsyncIterator[str | ls.MusicConfigUpdate]:
    yield "warm jazz cafe at midnight"
    await asyncio.sleep(8)

    # Absolute override: switch to bright electronic
    yield ls.MusicConfigUpdate(tempo="fast", brightness="very_bright", rhythm="electronic")
    await asyncio.sleep(8)

    # Relative nudge: dial brightness back down, add more echo
    yield ls.MusicConfigUpdate(brightness=Step(-2), echo=Step(+1))


session = ls.live(my_set(), transition_seconds=2.0)
session.play(seconds=30)

Sync generators work too — use Iterator instead of AsyncIterator and time.sleep instead of await asyncio.sleep.


Async API

For web servers and async apps:

import asyncio
import latentscore as ls


async def main() -> None:
    audio = await ls.arender("neon city rain")
    audio.save("neon.wav")


asyncio.run(main())

Bring Your Own LLM

Use any LLM through LiteLLM — OpenAI, Anthropic, Google, Mistral, Groq, and 100+ others. LiteLLM is included with latentscore.

import latentscore as ls

# Gemini (free tier available)
ls.render("cyberpunk rain on neon streets", model="external:gemini/gemini-3-flash-preview").play()

# Claude
ls.render("cozy library with rain outside", model="external:anthropic/claude-sonnet-4-5-20250929").play()

# GPT
ls.render("space station ambient", model="external:openai/gpt-4o").play()

API keys are read from environment variables automatically (GEMINI_API_KEY, ANTHROPIC_API_KEY, OPENAI_API_KEY).

LLM Metadata

External models return rich metadata alongside audio:

audio = ls.render("cyberpunk rain", model="external:gemini/gemini-3-flash-preview")

if audio.metadata is not None:
    print(audio.metadata.title)      # e.g. "Neon Rain Drift"
    print(audio.metadata.thinking)   # the LLM's reasoning
    print(audio.metadata.config)     # the MusicConfig it chose
    for palette in audio.metadata.palettes:
        print([c.hex for c in palette.colors])

Note: LLM models are slower than fast_heavy (network round-trips) and can occasionally produce invalid configs. fast_heavy is recommended for production use.


How It Works

You give LatentScore a vibe (a short text description) and it generates ambient music that matches.

The recommended fast_heavy model uses LAION-CLAP audio embeddings: your vibe text is encoded with CLAP's text encoder and matched against pre-computed CLAP audio embeddings of 10,000+ rendered music configurations. This matches text directly against what configs actually sound like. The best-matching config drives a real-time audio synthesizer.

The lighter fast model uses text-to-text retrieval instead (MiniLM sentence embeddings). It's marginally faster but scores 71% lower on audio-text alignment benchmarks.

Both approaches are instant (~2s), 100% reliable (no LLM hallucinations), and require no API keys. Our CLAP benchmarks showed that embedding retrieval outperforms Claude Opus 4.5 and Gemini 3 Flash at mapping vibes to music configurations, and fast_heavy outperforms fast by 71%.


Audio Contract

All audio produced by LatentScore follows this contract:

  • Format: float32 mono
  • Sample rate: 44100 Hz
  • Range: [-1.0, 1.0]
  • Shape: (n,) numpy array
import numpy as np
import latentscore as ls

audio = ls.render("deep ocean")
samples = np.asarray(audio)  # NDArray[np.float32]

Additional Info

Config Reference

Every MusicConfig field uses human-readable labels. Full reference:

Field Labels
tempo very_slow slow medium fast very_fast
brightness very_dark dark medium bright very_bright
space dry small medium large vast
motion static slow medium fast chaotic
stereo mono narrow medium wide ultra_wide
echo none subtle medium heavy infinite
human robotic tight natural loose drunk
attack soft medium sharp
grain clean warm gritty
density 2 3 4 5 6
root c c# d ... a# b
mode major minor dorian mixolydian

Layer styles:

Layer Styles
bass drone sustained pulsing walking fifth_drone sub_pulse octave arp_bass
pad warm_slow dark_sustained cinematic thin_high ambient_drift stacked_fifths bright_open
melody procedural contemplative rising falling minimal ornamental arp_melody contemplative_minor call_response heroic
rhythm none minimal heartbeat soft_four hats_only electronic kit_light kit_medium military tabla_essence brush
texture none shimmer shimmer_slow vinyl_crackle breath stars glitch noise_wash crystal pad_whisper
accent none bells pluck chime bells_dense blip blip_random brass_hit wind arp_accent piano_note

Local LLM (Expressive Mode)

Not recommended. The default fast and fast_heavy models are faster, more reliable, and produce higher-quality results. Expressive mode exists for experimentation only.

Runs a 270M-parameter Gemma 3 LLM locally. On macOS Apple Silicon, inference uses MLX (~5–15s). On CPU-only Linux/Windows, it uses transformers (30–120s per render). The local model can produce invalid configs and our benchmarks showed it barely outperforms a random baseline.

pip install 'latentscore[expressive]'
latentscore download expressive
ls.render("jazz cafe at midnight", model="expressive").play()

Research & Training Pipeline

The data_work/ folder contains the full research pipeline: data preparation, LLM-based config generation, SFT/GRPO training on Modal, CLAP benchmarking, and model export.

See data_work/README.md and docs/architecture.md for details.

Contributing

See CONTRIBUTE.md for environment setup and contribution guidelines.

See docs/coding-guidelines.md for code style requirements.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

latentscore-0.1.4.tar.gz (122.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

latentscore-0.1.4-py3-none-any.whl (117.0 kB view details)

Uploaded Python 3

File details

Details for the file latentscore-0.1.4.tar.gz.

File metadata

  • Download URL: latentscore-0.1.4.tar.gz
  • Upload date:
  • Size: 122.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for latentscore-0.1.4.tar.gz
Algorithm Hash digest
SHA256 8bb80fe4748677d008a93d4288cf03fdf56911c6a50c67a71fb9d6594f836901
MD5 176ea8d8aa16a8449ff6c019c96c777e
BLAKE2b-256 5878c926f286f0f41411062c64d072d6ea4cad92b611da7c7d7cba692d641c82

See more details on using hashes here.

Provenance

The following attestation bundles were made for latentscore-0.1.4.tar.gz:

Publisher: workflow.yml on prabal-rje/latentscore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file latentscore-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: latentscore-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 117.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for latentscore-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bcfbbbd21b1fea0860df29e4a628339f43c2f0cc2e14886f2f446185b159da22
MD5 0d125c6608f2063aa3df6d328f221f18
BLAKE2b-256 0d9f841af46c3e5ff335ebe925b897de4f7cbd55c6c004ecdf414d67ca07701a

See more details on using hashes here.

Provenance

The following attestation bundles were made for latentscore-0.1.4-py3-none-any.whl:

Publisher: workflow.yml on prabal-rje/latentscore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page