Skip to main content

No project description provided

Project description

Paint-with-Words, Implemented with Stable diffusion using Diffuers pipeline

CoRR preprint arXiv:2211.01324 CI Release Python PyPI

Unofficial 🤗 huggingface/diffusers-based implementation of Paint-with-Words proposed by the paper eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers. This implementation is based on cloneofsimo/paint-with-words-sd.

Subtle Control of the Image Generation

Notice how without PwW the cloud is missing.

Notice how without PwW, abandoned city is missing, and road becomes purple as well.

Shift the object : Same seed, just the segmentation map's positional difference

"A digital painting of a half-frozen lake near mountains under a full moon and aurora. A boat is in the middle of the lake. Highly detailed."

Notice how nearly all of the composition remains the same, other than the position of the moon.


Recently, researchers from NVIDIA proposed eDiffi. In the paper, they suggested method that allows "painting with word". Basically, this is like make-a-scene, but with just using adjusted cross-attention score. You can see the results and detailed method in the paper.

Their paper and their method was not open-sourced. Yet, paint-with-words can be implemented with Stable Diffusion since they share common Cross Attention module. So, I implemented it with Stable Diffusion.

Installation

pip install paint-with-words-pipeline

Basic Usage

Prepare segmentation map, and map-color : tag label such as below. keys are (R, G, B) format, and values are tag label.

{
    (0, 0, 0): "cat,1.0",
    (255, 255, 255): "dog,1.0",
    (13, 255, 0): "tree,1.5",
    (90, 206, 255): "sky,0.2",
    (74, 18, 1): "ground,0.2",
}

You neeed to have them so that they are in format "{label},{strength}", where strength is additional weight of the attention score you will give during generation, i.e., it will have more effect.

import torch
from paint_with_words.pipelines import PaintWithWordsPipeline

settings = {
    "color_context": {
        (0, 0, 0): "cat,1.0",
        (255, 255, 255): "dog,1.0",
        (13, 255, 0): "tree,1.5",
        (90, 206, 255): "sky,0.2",
        (74, 18, 1): "ground,0.2",
    },
    "color_map_img_path": "contents/example_input.png",
    "input_prompt": "realistic photo of a dog, cat, tree, with beautiful sky, on sandy ground",
    "output_img_path": "contents/output_cat_dog.png",
}

color_map_image_path = settings["color_map_img_path"]
color_context = settings["color_context"]
input_prompt = settings["input_prompt"]

# load pre-trained weight with paint with words pipeline
pipe = PaintWithWordsPipeline.from_pretrained(
    model_name,
    revision="fp16",
    torch_dtype=torch.float16,
)
pipe.safety_checker = None  # disable the safety checker
pipe.to("cuda")

# load color map image
color_map_image = Image.open(color_map_image_path).convert("RGB")

with torch.autocast("cuda"):
    image = pipe(
        prompt=input_prompt,
        color_context=color_context,
        color_map_image=color_map_image,
        latents=latents,
        num_inference_steps=30,
    ).images[0]

img.save(settings["output_img_path"])

Weight Scaling

In the paper, they used $w \log (1 + \sigma) \max (Q^T K)$ to scale appropriate attention weight. However, this wasn't optimal after few tests, found by CookiePPP. You can check out the effect of the functions below:

$w' = w \log (1 + \sigma) std (Q^T K)$

$w' = w \log (1 + \sigma) \max (Q^T K)$

$w' = w \log (1 + \sigma^2) std (Q^T K)$

You can define your own weight function and further tweak the configurations by defining weight_function argument in the PaintWithWordsPipeline.

Example:

def weight_function(
    w: torch.Tensor, 
    sigma: torch.Tensor, 
    qk: torch.Tensor,
) -> torch.Tensor:
    return 0.4 * w * math.log(sigma ** 2 + 1) * qk.std()

with torch.autocast("cuda"):
    image = pipe(
        prompt=input_prompt,
        color_context=color_context,
        color_map_image=color_map_image,
        latents=latents,
        num_inference_steps=30,
        #
        # set the weight function here:
        weight_function=weight_function,
        #
    ).images[0]

More on the weight function, (but higher)

$w' = w \log (1 + \sigma) std (Q^T K)$

$w' = w \log (1 + \sigma) \max (Q^T K)$

$w' = w \log (1 + \sigma^2) std (Q^T K)$

Example Notebooks

You can view the minimal working notebook here or Open In Colab


Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paint_with_words_pipeline-1.0.1.tar.gz (12.5 kB view details)

Uploaded Source

Built Distribution

paint_with_words_pipeline-1.0.1-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file paint_with_words_pipeline-1.0.1.tar.gz.

File metadata

  • Download URL: paint_with_words_pipeline-1.0.1.tar.gz
  • Upload date:
  • Size: 12.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.4 Linux/5.15.0-1042-azure

File hashes

Hashes for paint_with_words_pipeline-1.0.1.tar.gz
Algorithm Hash digest
SHA256 63ebeae99f02c2902f21a7e5724fa3b50cc1878748e690efb41d596a991ed27d
MD5 74c33077db40120fa0c93cdaa6f3f85d
BLAKE2b-256 58f6ccb1c086e5b9bad2c6e0b7ad06271fee2ed92a4e4eaa4fd23cd47ff883cf

See more details on using hashes here.

File details

Details for the file paint_with_words_pipeline-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for paint_with_words_pipeline-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c32bfbb953fbe0038ce98b743ced9762b6fbba66d261e0b0956bdbf3e4f54674
MD5 6b189f9b5130e04fb32d242d66ea4afd
BLAKE2b-256 dfa8d97a994a530d875a12d5e4b8482a2b846223555735f445ed58d81b0504f2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page