Skip to main content

Stable diffusion for real-time music generation.

Project description

:guitar: Riffusion

CI status Python 3.9 | 3.10 MIT License

Riffusion is a library for real-time music and audio generation with stable diffusion.

Read about it at https://www.riffusion.com/about and try it at https://www.riffusion.com/.

This is the core repository for riffusion image and audio processing code.

  • Diffusion pipeline that performs prompt interpolation combined with image conditioning
  • Conversions between spectrogram images and audio clips
  • Command-line interface for common tasks
  • Interactive app using streamlit
  • Flask server to provide model inference via API
  • Various third party integrations

Related repositories:

Citation

If you build on this work, please cite it as follows:

@article{Forsgren_Martiros_2022,
  author = {Forsgren, Seth* and Martiros, Hayk*},
  title = {{Riffusion - Stable diffusion for real-time music generation}},
  url = {https://riffusion.com/about},
  year = {2022}
}

Install

Tested in CI with Python 3.9 and 3.10.

It's highly recommended to set up a virtual Python environment with conda or virtualenv:

conda create --name riffusion python=3.9
conda activate riffusion

Install Python package:

pip install -U riffusion

or clone the repository and install from source:

git clone https://github.com/riffusion/riffusion.git
cd riffusion
python -m pip install --editable .

In order to use audio formats other than WAV, ffmpeg is required.

sudo apt-get install ffmpeg          # linux
brew install ffmpeg                  # mac
conda install -c conda-forge ffmpeg  # conda

If torchaudio has no backend, you may need to install libsndfile. See this issue.

If you have an issue, try upgrading diffusers. Tested with 0.9 - 0.11.

Guides:

Backends

CPU

cpu is supported but is quite slow.

CUDA

cuda is the recommended and most performant backend.

To use with CUDA, make sure you have torch and torchaudio installed with CUDA support. See the install guide or stable wheels.

To generate audio in real-time, you need a GPU that can run stable diffusion with approximately 50 steps in under five seconds, such as a 3090 or A10G.

Test availability with:

import torch
torch.cuda.is_available()

MPS

The mps backend on Apple Silicon is supported for inference but some operations fall back to CPU, particularly for audio processing. You may need to set PYTORCH_ENABLE_MPS_FALLBACK=1.

In addition, this backend is not deterministic.

Test availability with:

import torch
torch.backends.mps.is_available()

Command-line interface

Riffusion comes with a command line interface for performing common tasks.

See available commands:

riffusion -h

Get help for a specific command:

riffusion image-to-audio -h

Execute:

riffusion image-to-audio --image spectrogram_image.png --audio clip.wav

Riffusion Playground

Riffusion contains a streamlit app for interactive use and exploration.

Run with:

riffusion-playground

And access at http://127.0.0.1:8501/

Riffusion Playground

Run the model server

Riffusion can be run as a flask server that provides inference via API. This server enables the web app to run locally.

Run with:

riffusion-server --host 127.0.0.1 --port 3013

You can specify --checkpoint with your own directory or huggingface ID in diffusers format.

Use the --device argument to specify the torch device to use.

The model endpoint is now available at http://127.0.0.1:3013/run_inference via POST request.

Example input (see InferenceInput for the API):

{
  "alpha": 0.75,
  "num_inference_steps": 50,
  "seed_image_id": "og_beat",

  "start": {
    "prompt": "church bells on sunday",
    "seed": 42,
    "denoising": 0.75,
    "guidance": 7.0
  },

  "end": {
    "prompt": "jazz with piano",
    "seed": 123,
    "denoising": 0.75,
    "guidance": 7.0
  }
}

Example output (see InferenceOutput for the API):

{
  "image": "< base64 encoded JPEG image >",
  "audio": "< base64 encoded MP3 clip >"
}

Tests

Tests live in the test/ directory and are implemented with unittest.

To run all tests:

python -m unittest test/*_test.py

To run a single test:

python -m unittest test.audio_to_image_test

To preserve temporary outputs for debugging, set RIFFUSION_TEST_DEBUG:

RIFFUSION_TEST_DEBUG=1 python -m unittest test.audio_to_image_test

To run a single test case within a test:

python -m unittest test.audio_to_image_test -k AudioToImageTest.test_stereo

To run tests using a specific torch device, set RIFFUSION_TEST_DEVICE. Tests should pass with cpu, cuda, and mps backends.

Development Guide

Install additional packages for dev with python -m pip install -r requirements-dev.txt.

  • Linters: ruff, flake8, pylint
  • Formatter: black
  • Type checker: mypy
  • Docstring checker: pydocstyle

These are configured in pyproject.toml.

The results of mypy ., black ., and ruff . must be clean to accept a PR.

CI is run through GitHub Actions from .github/workflows/ci.yml.

Contributions are welcome through pull requests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

riffusion-0.0.5.tar.gz (43.0 kB view details)

Uploaded Source

Built Distribution

riffusion-0.0.5-py3-none-any.whl (51.5 kB view details)

Uploaded Python 3

File details

Details for the file riffusion-0.0.5.tar.gz.

File metadata

  • Download URL: riffusion-0.0.5.tar.gz
  • Upload date:
  • Size: 43.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.10 Darwin/22.3.0

File hashes

Hashes for riffusion-0.0.5.tar.gz
Algorithm Hash digest
SHA256 3edabf0de17bbdaeba709af9b0c1e7d16cd0513bf76e15e9062cff72d6b37b61
MD5 7030d4708221ab0638f4b6786e347529
BLAKE2b-256 701c3c7a5a7220270b25692f7a4fa69adadf54aafd18ddf106caf0d7dc9a1ab0

See more details on using hashes here.

File details

Details for the file riffusion-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: riffusion-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 51.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.10 Darwin/22.3.0

File hashes

Hashes for riffusion-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 339ec92614d502ee1c843b3397b9eddb79d62b479982098642ee670712799f15
MD5 137283f183500af38a0a3b452737b2f8
BLAKE2b-256 a9396994ee0e1e636a983ac6fbc61ad57751615b73d89421cb056de037b05ff6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page