Skip to main content

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Project description

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

GitHub


Introduction

we present IP-Adapter, an effective and lightweight adapter to achieve image prompt capability for the pre-trained text-to-image diffusion models. An IP-Adapter with only 22M parameters can achieve comparable or even better performance to a fine-tuned image prompt model. IP-Adapter can be generalized not only to other custom models fine-tuned from the same base model, but also to controllable generation using existing controllable tools. Moreover, the image prompt can also work well with the text prompt to accomplish multimodal image generation.

arch

Release

  • [2023/12/20] 🔥 Add an experimental version of IP-Adapter-FaceID, more information can be found here.
  • [2023/11/22] IP-Adapter is available in Diffusers thanks to Diffusers Team.
  • [2023/11/10] 🔥 Add an updated version of IP-Adapter-Face. The demo is here.
  • [2023/11/05] 🔥 Add text-to-image demo with IP-Adapter and Kandinsky 2.2 Prior
  • [2023/11/02] Support safetensors
  • [2023/9/08] 🔥 Update a new version of IP-Adapter with SDXL_1.0. More information can be found here.
  • [2023/9/05] 🔥🔥🔥 IP-Adapter is supported in WebUI and ComfyUI (or ComfyUI_IPAdapter_plus).
  • [2023/8/30] 🔥 Add an IP-Adapter with face image as prompt. The demo is here.
  • [2023/8/29] 🔥 Release the training code.
  • [2023/8/23] 🔥 Add code and models of IP-Adapter with fine-grained features. The demo is here.
  • [2023/8/18] 🔥 Add code and models for SDXL 1.0. The demo is here.
  • [2023/8/16] 🔥 We release the code and models.

Installation

# install latest diffusers
pip install diffusers==0.22.1

# install ip-adapter
pip install git+https://github.com/tencent-ailab/IP-Adapter.git

# download the models
cd IP-Adapter
git lfs install
git clone https://huggingface.co/h94/IP-Adapter
mv IP-Adapter/models models
mv IP-Adapter/sdxl_models sdxl_models

# then you can use the notebook

Download Models

you can download models from here. To run the demo, you should also download the following models:

How to Use

SD_1.5

  • ip_adapter_demo: image variations, image-to-image, and inpainting with image prompt.
  • ip_adapter_demo

image variations

image-to-image

inpainting

structural_cond structural_cond2

multi_prompts

ip_adpter_plus_image_variations ip_adpter_plus_multi

ip_adpter_plus_face

Best Practice

  • If you only use the image prompt, you can set the scale=1.0 and text_prompt=""(or some generic text prompts, e.g. "best quality", you can also use any negative text prompt). If you lower the scale, more diverse images can be generated, but they may not be as consistent with the image prompt.
  • For multimodal prompts, you can adjust the scale to get the best results. In most cases, setting scale=0.5 can get good results. For the version of SD 1.5, we recommend using community models to generate good images.

IP-Adapter for non-square images

As the image is center cropped in the default image processor of CLIP, IP-Adapter works best for square images. For the non square images, it will miss the information outside the center. But you can just resize to 224x224 for non-square images, the comparison is as follows:

SDXL_1.0

The comparison of IP-Adapter_XL with Reimagine XL is shown as follows:

sdxl_demo

Improvements in new version (2023.9.8):

  • Switch to CLIP-ViT-H: we trained the new IP-Adapter with OpenCLIP-ViT-H-14 instead of OpenCLIP-ViT-bigG-14. Although ViT-bigG is much larger than ViT-H, our experimental results did not find a significant difference, and the smaller model can reduce the memory usage in the inference phase.
  • A Faster and better training recipe: In our previous version, training directly at a resolution of 1024x1024 proved to be highly inefficient. However, in the new version, we have implemented a more effective two-stage training strategy. Firstly, we perform pre-training at a resolution of 512x512. Then, we employ a multi-scale strategy for fine-tuning. (Maybe this training strategy can also be used to speed up the training of controlnet).

How to Train

For training, you should install accelerate and make your own dataset into a json file.

accelerate launch --num_processes 8 --multi_gpu --mixed_precision "fp16" \
  tutorial_train.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5/" \
  --image_encoder_path="{image_encoder_path}" \
  --data_json_file="{data.json}" \
  --data_root_path="{image_path}" \
  --mixed_precision="fp16" \
  --resolution=512 \
  --train_batch_size=8 \
  --dataloader_num_workers=4 \
  --learning_rate=1e-04 \
  --weight_decay=0.01 \
  --output_dir="{output_dir}" \
  --save_steps=10000

Once training is complete, you can convert the weights with the following code:

import torch
ckpt = "checkpoint-50000/pytorch_model.bin"
sd = torch.load(ckpt, map_location="cpu")
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("unet"):
        pass
    elif k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif k.startswith("adapter_modules"):
        ip_sd[k.replace("adapter_modules.", "")] = sd[k]

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin")

Third-party Usage

Disclaimer

This project strives to positively impact the domain of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it in a responsible manner. The developers do not assume any responsibility for potential misuse by users.

Citation

If you find IP-Adapter useful for your research and applications, please cite using this BibTeX:

@article{ye2023ip-adapter,
  title={IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models},
  author={Ye, Hu and Zhang, Jun and Liu, Sibo and Han, Xiao and Yang, Wei},
  booktitle={arXiv preprint arxiv:2308.06721},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ip_adapter-0.1.0.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

ip_adapter-0.1.0-py3-none-any.whl (25.5 kB view details)

Uploaded Python 3

File details

Details for the file ip_adapter-0.1.0.tar.gz.

File metadata

  • Download URL: ip_adapter-0.1.0.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.7 Darwin/23.1.0

File hashes

Hashes for ip_adapter-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d61e83ed3339b6bf27d7ce72fcb81870c97ed3bbfc0d47e2a32fcd2b0fab7806
MD5 a334f38ec66a5b4c3aad77605a2f6072
BLAKE2b-256 7e5f2a99116ac309c2960aaff581f98c09448211567738f8cafa88fdf5ef2be9

See more details on using hashes here.

File details

Details for the file ip_adapter-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ip_adapter-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.7 Darwin/23.1.0

File hashes

Hashes for ip_adapter-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3802d710d4a93208f830b959128ab73ce9ce8dc0959d961255aee708128b1208
MD5 309c45ea28938654d57d19d44e3e4b6b
BLAKE2b-256 08bb3659ff9e7cefedc66f64fa24b2f53acfdee138aae97fcbb74f73658f1a5f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page