Skip to main content

Diffusers

Project description



GitHub GitHub release Contributor Covenant

🤗 Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves as a modular toolbox for inference and training of diffusion models.

More precisely, 🤗 Diffusers offers:

  • State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see src/diffusers/pipelines).
  • Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see src/diffusers/schedulers).
  • Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see src/diffusers/models).
  • Training examples to show how to train the most popular diffusion models (see examples).

Quickstart

In order to get started, we recommend taking a look at two notebooks:

  • The Getting started with Diffusers notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines. Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library.
  • The Training a diffusers model notebook summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your diffuser model on an image dataset, with explanatory graphics.

Examples

If you want to run the code yourself 💻, you can try out:

# !pip install diffusers transformers
from diffusers import DiffusionPipeline

model_id = "CompVis/ldm-text2im-large-256"

# load model and scheduler
ldm = DiffusionPipeline.from_pretrained(model_id)

# run pipeline in inference (sample random noise and denoise)
prompt = "A painting of a squirrel eating a burger"
images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6)["sample"]

# save images
for idx, image in enumerate(images):
    image.save(f"squirrel-{idx}.png")
# !pip install diffusers
from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline

model_id = "google/ddpm-celebahq-256"

# load model and scheduler
ddpm = DDPMPipeline.from_pretrained(model_id)  # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference

# run pipeline in inference (sample random noise and denoise)
image = ddpm()["sample"]

# save image
image[0].save("ddpm_generated_image.png")

If you just want to play around with some web demos, you can try out the following 🚀 Spaces:

Model Hugging Face Spaces
Text-to-Image Latent Diffusion Hugging Face Spaces
Faces generator Hugging Face Spaces
DDPM with different schedulers Hugging Face Spaces

Definitions

Models: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to denoise a noisy input to an image. Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet


Figure from DDPM paper (https://arxiv.org/abs/2006.11239).

Schedulers: Algorithm class for both inference and training. The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training. Examples: DDPM, DDIM, PNDM, DEIS


Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239).

Diffusion Pipeline: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ... Examples: Glide, Latent-Diffusion, Imagen, DALL-E 2


Figure from ImageGen (https://imagen.research.google/).

Philosophy

  • Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. E.g., the provided schedulers are separated from the provided models and provide well-commented code that can be read alongside the original paper.
  • Diffusers is modality independent and focuses on providing pretrained models and tools to build systems that generate continous outputs, e.g. vision and audio.
  • Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of another library, such as text-encoders. Examples for diffusion pipelines are Glide and Latent Diffusion.

Installation

pip install diffusers  # should install diffusers 0.1.3

In the works

For the first release, 🤗 Diffusers focuses on text-to-image diffusion techniques. However, diffusers can be used for much more than that! Over the upcoming releases, we'll be focusing on:

A few pipeline components are already being worked on, namely:

  • BDDMPipeline for spectrogram-to-sound vocoding
  • GLIDEPipeline to support OpenAI's GLIDE model
  • Grad-TTS for text to audio generation / conditional audio generation

We want diffusers to be a toolbox useful for diffusers models in general; if you find yourself limited in any way by the current API, or would like to see additional models, schedulers, or techniques, please open a GitHub issue mentioning what you would like to see.

Credits

This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today:

  • @CompVis' latent diffusion models library, available here
  • @hojonathanho original DDPM implementation, available here as well as the extremely useful translation into PyTorch by @pesser, available here
  • @ermongroup's DDIM implementation, available here.
  • @yang-song's Score-VE and Score-VP implementations, available here

We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available here as well as @crowsonkb and @rromb for useful discussions and insights.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffusers-0.1.3.tar.gz (76.6 kB view details)

Uploaded Source

Built Distribution

diffusers-0.1.3-py3-none-any.whl (95.8 kB view details)

Uploaded Python 3

File details

Details for the file diffusers-0.1.3.tar.gz.

File metadata

  • Download URL: diffusers-0.1.3.tar.gz
  • Upload date:
  • Size: 76.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.2 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.13

File hashes

Hashes for diffusers-0.1.3.tar.gz
Algorithm Hash digest
SHA256 ae56c0a8107191e0bf3e4fc51116afe685a4401fb78ec0fba4ad825f58476944
MD5 9ef8d716635153230cac655339809389
BLAKE2b-256 46b6447da32c8c0444984d9434324ab0697e0cf1e096f46e4e7a86a6c780fc87

See more details on using hashes here.

File details

Details for the file diffusers-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: diffusers-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 95.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.11.3 pkginfo/1.8.2 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.13

File hashes

Hashes for diffusers-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 f1258dc3b7b1372f0c05b36e42a4df7be3991c65dd7eeebb52aa2c0bdda644c8
MD5 a61cad34a87066e581d2e814dea811c2
BLAKE2b-256 b063b95287c77ce014b313764267cc01eff81594f8810fdf0431dc36e1652718

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page