Skip to main content

Diffusers

Project description



GitHub GitHub release Contributor Covenant

🤗 Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves as a modular toolbox for inference and training of diffusion models.

More precisely, 🤗 Diffusers offers:

  • State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see src/diffusers/pipelines).
  • Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see src/diffusers/schedulers).
  • Multiple types of models, such as UNet, that can be used as building blocks in an end-to-end diffusion system (see src/diffusers/models).
  • Training examples to show how to train the most popular diffusion models (see examples).

Definitions

Models: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to denoise a noisy input to an image. Examples: UNet, Conditioned UNet, 3D UNet, Transformer UNet


Figure from DDPM paper (https://arxiv.org/abs/2006.11239).

Schedulers: Algorithm class for both inference and training. The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training. Examples: DDPM, DDIM, PNDM, DEIS


Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239).

Diffusion Pipeline: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ... Examples: Glide, Latent-Diffusion, Imagen, DALL-E 2


Figure from ImageGen (https://imagen.research.google/).

Philosophy

  • Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. E.g., the provided schedulers are separated from the provided models and provide well-commented code that can be read alongside the original paper.
  • Diffusers is modality independent and focusses on providing pretrained models and tools to build systems that generate continous outputs, e.g. vision and audio.
  • Diffusion models and schedulers are provided as consise, elementary building blocks whereas diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of other library, such as text-encoders. Examples for diffusion pipelines are Glide and Latent Diffusion.

Quickstart

Check out this notebook: https://colab.research.google.com/drive/1nMfF04cIxg6FujxsNYi9kiTRrzj4_eZU?usp=sharing

Installation

pip install diffusers  # should install diffusers 0.0.4

1. diffusers as a toolbox for schedulers and models

diffusers is more modularized than transformers. The idea is that researchers and engineers can use only parts of the library easily for the own use cases. It could become a central place for all kinds of models, schedulers, training utils and processors that one can mix and match for one's own use case. Both models and schedulers should be load- and saveable from the Hub.

For more examples see schedulers and models

Example for Unconditonal Image generation DDPM:

import torch
from diffusers import UNet2DModel, DDIMScheduler
import PIL.Image
import numpy as np
import tqdm

torch_device = "cuda" if torch.cuda.is_available() else "cpu"

# 1. Load models
scheduler = DDIMScheduler.from_config("fusing/ddpm-celeba-hq", tensor_format="pt")
unet = UNet2DModel.from_pretrained("fusing/ddpm-celeba-hq", ddpm=True).to(torch_device)

# 2. Sample gaussian noise
generator = torch.manual_seed(23)
unet.image_size = unet.resolution
image = torch.randn(
   (1, unet.in_channels, unet.image_size, unet.image_size),
   generator=generator,
)
image = image.to(torch_device)

# 3. Denoise
num_inference_steps = 50
eta = 0.0  # <- deterministic sampling
scheduler.set_timesteps(num_inference_steps)

for t in tqdm.tqdm(scheduler.timesteps):
    # 1. predict noise residual
    with torch.no_grad():
        residual = unet(image, t)["sample"]

    prev_image = scheduler.step(residual, t, image, eta)["prev_sample"]

    # 3. set current image to prev_image: x_t -> x_t-1
    image = prev_image

# 4. process image to PIL
image_processed = image.cpu().permute(0, 2, 3, 1)
image_processed = (image_processed + 1.0) * 127.5
image_processed = image_processed.numpy().astype(np.uint8)
image_pil = PIL.Image.fromarray(image_processed[0])

# 5. save image
image_pil.save("generated_image.png")

Example for Unconditonal Image generation LDM:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diffusers-0.1.0.tar.gz (73.6 kB view details)

Uploaded Source

Built Distribution

diffusers-0.1.0-py3-none-any.whl (143.1 kB view details)

Uploaded Python 3

File details

Details for the file diffusers-0.1.0.tar.gz.

File metadata

  • Download URL: diffusers-0.1.0.tar.gz
  • Upload date:
  • Size: 73.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for diffusers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b58ab5157955916135f1175c3811fecfa3fc0479d0e3ebd1a294dc386fd34725
MD5 8a6f95870b8c781a28c92d8b58aee31b
BLAKE2b-256 e501389b11b5ff49da5ba708cdc30542afaa35bd22d86e3d83bc2cd1050a8e91

See more details on using hashes here.

File details

Details for the file diffusers-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: diffusers-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 143.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.9.7

File hashes

Hashes for diffusers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ae7d389bcd516f15ac6430d1973e70f797d39565e2a5ecafa536f050d54ca72
MD5 4715658255bdaae2f6faca76c8a6cc09
BLAKE2b-256 d21fae22e7cff2897c22d64949470f48379413e6883d3ef5ea5894c07b7695b5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page