Skip to main content

A library for teaching and exploring generative latent flow matching

Project description

flocoder

This is a (Work In Progress!) teaching and research package for exploring latent generative flow matching models. (The name is inspired by "vocoder.")

This project initially started as a way to provide a lightweight, fast (and interpretable?) upgrade to the diffusion model system Pictures of MIDI for MIDI piano roll images, but flocoder is intended to work on more general datasets too.

Quickstart

Head over to notebooks/SD_Flower_Flow.ipynb and run through it for a taste. It will run on Colab.

Overview

Check out the sets of slides linked to on notebooks/README.md.

Architecture Overview

MIDI Flow Architecture

The above diagram illustrates the architecture of our intended model: a VQVAE compresses MIDI data into a discrete latent space, while a flow model learns to generate new samples in the continuous latent space.

Though we can also flow in the continuous space of a VAE like the one for Stable Diffusion, which may be easier for starters.

Installation

# Clone the repository
git clone https://github.com/drscotthawley/flocoder.git
cd flocoder

# Install uv if not already installed
# On macOS/Linux:
# curl -LsSf https://astral.sh/uv/install.sh | sh
# On Windows PowerShell:
# irm https://astral.sh/uv/install.ps1 | iex

# Create a virtual environment with uv, specifying Python 3.10
uv venv --python=python3.10

# Activate the virtual environment
# On Linux/macOS:
source .venv/bin/activate
# On Windows:
# .venv\Scripts\activate

# Install the package in editable mode (See below if you get NATTEN errors!)
uv pip install -e .

# Recommended: Install development dependencies (jupyter, others...)
uv pip install -e ".[dev]"

# Recommended: install NATTEN separately with special flags
uv pip install natten --no-build-isolation
# if that fails, see NATTEN's install instructions (https://github.com/SHI-Labs/NATTEN/blob/main/docs/install.md)
# and specify exact version number, e.g.
# uv pip install natten==0.20.1+torch270cu128 -f https://whl.natten.org
# or build fromt the top of the source, e.g.:
# uv pip install --no-build-isolation git+https://github.com/SHI-Labs/NATTEN

Project Structure

The project is organized as follows:

  • flocoder/: Main package code
  • scripts/: Training and evaluation scripts
  • configs/: Configuration files for models and training
  • notebooks/: Jupyter notebooks for tutorials and examples
  • tests/: Unit tests

Training

The package includes multiple training scripts located in main directory.

You can skip the autoencoder/"codec" training if you'd rather use the pretrained Stable Diffusion VAE, e.g. for what follows:

export CONFIG_FILE=flowers_sd.yaml

Optional: Training a VQGAN

You can use use the Stable Diffusion VAE to get started quickly. (It will auto-download). But if you want to train your own...

export CONFIG_FILE=flowers_vqgan.yaml 
#export CONFIG_FILE=midi.yaml 
./train_vqgan.py --config-name $CONFIG_FILE

The autoencoder AKA "codec" (e.g. VQGAN) compresses roll images into a quantized latent representation. This will save checkpoints in the checkpoints/ directory. Use that checkpoint to pre-encode your data like so...

Pre-Encoding Data (with frozen augmentations)

Takes about 20 minutes to run on a single GPU.

./preencode_data.py --config-name $CONFIG_FILE

Training the Flow Model

./train_flow.py --config-name $CONFIG_FILE

The flow model operates in the latent space created by the autoencoder.

Generating Samples

# Generate new MIDI samples
./generate_samples.py --config-name $CONFIG_FILE
# or with optional gradio UI:
#./generate_samples.py --config-name $CONFIG_FILE +use_gradio=true

This generates new samples by sampling from the flow model and decoding through the VQVAE.

Contributing

Contributions are VERY welcome! See Contributing.md. Thanks in advance.

Discussions

Discussions are open! Rather than starting some ad-hoc Discord server, let's share ideas, questions, insights, etc. using the Discussions tab.

TODO

  • Add Discussions area
  • Add Style Guide
  • Replace custom config/CLI arg system with Hydra or other package
  • Rename "vae"/"vqvae"/"vqgan" variable as just "codec"
  • Replace Class in preencode_data.py with functions as per Style Guide
  • Research: Figure out why conditioning fails for latent model
  • Add Standalone sampler script / Gradio demo?
  • Add metrics (to wandb out) to quantify flow training progress (sinkhorn, FID)
  • Add Contributing guidelines
  • Try variable size scheduler
  • Add audio example, e.g. using DAC
  • low-priority: Make RK4(5) integrator fully CUDA-compatible
  • Straighter/OT paths: Add ReFlow, Minibatch OT, Ray's Rays, Curvature penalty,...
  • Add jitter / diffusion for comparison
  • Add Documentation
  • Improve overall introduction/orientation
  • Fix "code smell" throughout -- repeated methods, hard-coded values, etc.
  • Research: Figure out how to accelerate training of flows!!
  • Research: Figure out how to accelerate training of vqgan
  • Research: improve output quality of midi-flow (and midi-vqgan)
  • Inference speedup: investigate model quantization / pruning (pytorch.ao?)
  • Ops: Add tests
  • Ops: Add CI
  • Investigate "Mean Flows for One-step Generative Modeling"

Acknowledgement

This project is generously supported by Hyperstate Music AI.

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flocoder-0.1.1.tar.gz (82.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flocoder-0.1.1-py3-none-any.whl (96.4 kB view details)

Uploaded Python 3

File details

Details for the file flocoder-0.1.1.tar.gz.

File metadata

  • Download URL: flocoder-0.1.1.tar.gz
  • Upload date:
  • Size: 82.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for flocoder-0.1.1.tar.gz
Algorithm Hash digest
SHA256 6f2a9cf395d40fc9cbff447a317b516b2852171d9d156a016560ebd7e7f13c32
MD5 c0c99bfa999b3b67e688d073d35e66ec
BLAKE2b-256 d4dd38af2dd4e2ba99446faf6bf9e4df71412c2a0b16d969edf35d09556b6f9a

See more details on using hashes here.

File details

Details for the file flocoder-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: flocoder-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 96.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for flocoder-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3fd4bcd925e2359f941daf48ec57dcefbb500513e59922b8b5abc36b1fe57e5
MD5 165f978ea2262ecffc8ba2b14f67f3a2
BLAKE2b-256 0d863854d91ffd486f50a2beeac73af03b3792a871259b8fc81ccc8dbc61859d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page