Skip to main content

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Project description

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding Paper page

[Paper]   [Project Page]   [Model Card]

[🤗 Demo (Realistic)]   [🤗 Demo (Stylization)]

[Replicate Demo (Realistic)]   [Replicate Demo (Stylization)]

If the ID fidelity is not enough for you, please try our stylization application, you may be pleasantly surprised.


Official implementation of PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding.

🌠 Key Features:

  1. Rapid customization within seconds, with no additional LoRA training.
  2. Ensures impressive ID fidelity, offering diversity, promising text controllability, and high-quality generation.
  3. Can serve as an Adapter to collaborate with other Base Models alongside LoRA modules in community.

❗❗ Note: If there are any PhotoMaker based resources and applications, please leave them in the discussion and we will list them in the Related Resources section in README file. Now we know the implementation of Replicate, Windows, ComfyUI, and WebUI. Thank you all!

photomaker_demo_fast

🚩 New Features/Updates

  • ✅ Jan. 20, 2024. An important note: For those GPUs that do not support bfloat16, please change this line to torch_dtype = torch.float16, the speed will be greatly improved (1min/img (before) vs. 14s/img (after) on V100). The minimum GPU memory requirement for PhotoMaker is 15G.
  • ✅ Jan. 15, 2024. We release PhotoMaker.

🔥 Examples

Realistic generation

Stylization generation

Note: only change the base model and add the LoRA modules for better stylization

🔧 Dependencies and Installation

conda create --name photomaker python=3.10
conda activate photomaker
pip install -U pip

# Install requirements
pip install -r requirements.txt

# Install photomaker
pip install git+https://github.com/TencentARC/PhotoMaker.git

Then you can run the following command to use it

from photomaker import PhotoMakerStableDiffusionXLPipeline

⏬ Download Models

The model will be automatically downloaded through the following two lines:

from huggingface_hub import hf_hub_download
photomaker_path = hf_hub_download(repo_id="TencentARC/PhotoMaker", filename="photomaker-v1.bin", repo_type="model")

You can also choose to download manually from this url.

💻 How to Test

Use like diffusers

  • Dependency
import torch
import os
from diffusers.utils import load_image
from diffusers import EulerDiscreteScheduler
from photomaker import PhotoMakerStableDiffusionXLPipeline

### Load base model
pipe = PhotoMakerStableDiffusionXLPipeline.from_pretrained(
    base_model_path,  # can change to any base model based on SDXL
    torch_dtype=torch.bfloat16, 
    use_safetensors=True, 
    variant="fp16"
).to(device)

### Load PhotoMaker checkpoint
pipe.load_photomaker_adapter(
    os.path.dirname(photomaker_path),
    subfolder="",
    weight_name=os.path.basename(photomaker_path),
    trigger_word="img"  # define the trigger word
)     

pipe.scheduler = EulerDiscreteScheduler.from_config(pipe.scheduler.config)

### Also can cooperate with other LoRA modules
# pipe.load_lora_weights(os.path.dirname(lora_path), weight_name=lora_model_name, adapter_name="xl_more_art-full")
# pipe.set_adapters(["photomaker", "xl_more_art-full"], adapter_weights=[1.0, 0.5])

pipe.fuse_lora()
  • Input ID Images
### define the input ID images
input_folder_name = './examples/newton_man'
image_basename_list = os.listdir(input_folder_name)
image_path_list = sorted([os.path.join(input_folder_name, basename) for basename in image_basename_list])

input_id_images = []
for image_path in image_path_list:
    input_id_images.append(load_image(image_path))

  • Generation
# Note that the trigger word `img` must follow the class word for personalization
prompt = "a half-body portrait of a man img wearing the sunglasses in Iron man suit, best quality"
negative_prompt = "(asymmetry, worst quality, low quality, illustration, 3d, 2d, painting, cartoons, sketch), open mouth, grayscale"
generator = torch.Generator(device=device).manual_seed(42)
images = pipe(
    prompt=prompt,
    input_id_images=input_id_images,
    negative_prompt=negative_prompt,
    num_images_per_prompt=1,
    num_inference_steps=num_steps,
    start_merge_step=10,
    generator=generator,
).images[0]
gen_images.save('out_photomaker.png')

Start a local gradio demo

Run the following command:

python gradio_demo/app.py

You could customize this script in this file.

If you want to run it on MAC, you should follow this Instruction and then run the app.py.

Usage Tips:

  • Upload more photos of the person to be customized to improve ID fidelity. If the input is Asian face(s), maybe consider adding 'Asian' before the class word, e.g., Asian woman img
  • When stylizing, does the generated face look too realistic? Adjust the Style strength to 30-50, the larger the number, the less ID fidelity, but the stylization ability will be better. You could also try out other base models or LoRAs with good stylization effects.
  • Reduce the number of generated images and sampling steps for faster speed. However, please keep in mind that reducing the sampling steps may compromise the ID fidelity.

Related Resources

Replicate demo of PhotoMaker:

  1. Demo link, run PhotoMaker on replicate.
  2. Demo link (style version).

Windows version of PhotoMaker:

  1. bmaltais/PhotoMaker by @bmaltais, easy to deploy PhotoMaker on Windows. The description can be found in this link.
  2. sdbds/PhotoMaker-for-windows by @sdbds.

ComfyUI:

  1. https://github.com/ZHO-ZHO-ZHO/ComfyUI-PhotoMaker
  2. https://github.com/StartHua/Comfyui-Mine-PhotoMaker
  3. https://github.com/shiimizu/ComfyUI-PhotoMaker

Graido demo in 45 lines

Provided by @Gradio

🤗 Acknowledgements

  • PhotoMaker is co-hosted by Tencent ARC Lab and Nankai University MCG-NKU.
  • Inspired from many excellent demos and repos, including IP-Adapter, multimodalart/Ip-Adapter-FaceID, FastComposer, and T2I-Adapter. Thanks for their great work!
  • Thanks to the Venus team in Tencent PCG for their feedback and suggestions.
  • Thanks to the HuggingFace team for their generous support!

Disclaimer

This project strives to impact the domain of AI-driven image generation positively. Users are granted the freedom to create images using this tool, but they are expected to comply with local laws and utilize it responsibly. The developers do not assume any responsibility for potential misuse by users.

BibTeX

If you find PhotoMaker useful for your research and applications, please cite using this BibTeX:

@article{li2023photomaker,
  title={PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding},
  author={Li, Zhen and Cao, Mingdeng and Wang, Xintao and Qi, Zhongang and Cheng, Ming-Ming and Shan, Ying},
  booktitle={arXiv preprint arxiv:2312.04461},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

photomakerv-0.1.0.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

photomakerv-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file photomakerv-0.1.0.tar.gz.

File metadata

  • Download URL: photomakerv-0.1.0.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for photomakerv-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f295a8a055f6ebc70d0686584f67b1acefdbe6b59dad3b88666d21efe89be829
MD5 55de1841ea5fc3eed8d06a4a37c08dfd
BLAKE2b-256 35dac3186f79ce2e8cddbfc25db92c77bfec139e4081c3bb1f09a586c37b3f98

See more details on using hashes here.

File details

Details for the file photomakerv-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: photomakerv-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for photomakerv-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 955e3bbc62918c938fa1a4a2a2b398e1bf00a4ec286958839efffbdfaa516031
MD5 c8f94afc62dac9d38dcc277300a05e15
BLAKE2b-256 297e875f54aedb000303a3dcbe76d9749300de53d3a2852854212c6f96d491a3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page