Skip to main content

onediff extensions for diffusers

Project description

OneDiffX (for HF diffusers)

OneDiffX is a OneDiff Extension for HF diffusers. It provides some acceleration utilities, such as DeepCache.

Install and setup

  1. Follow the steps here to install onediff.

  2. Install onediffx by following these steps

    git clone https://github.com/siliconflow/onediff.git
    cd onediff_diffusers_extensions && python3 -m pip install -e .
    

compile_pipe

Compile diffusers pipeline with compile_pipe.

import torch
from diffusers import StableDiffusionXLPipeline

from onediffx import compile_pipe

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
pipe.to("cuda")

pipe = compile_pipe(pipe)

DeepCache speedup

Run Stable Diffusion XL with OneDiffX

import torch

from onediffx import compile_pipe
from onediffx.deep_cache import StableDiffusionXLPipeline

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
pipe.to("cuda")

pipe = compile_pipe(pipe)

prompt = "A photo of a cat. Focus light and create sharp, defined edges."
# Warmup
for i in range(1):
    deepcache_output = pipe(
        prompt, 
        cache_interval=3, cache_layer_id=0, cache_block_id=0,
        output_type='pil'
    ).images[0]

deepcache_output = pipe(
    prompt, 
    cache_interval=3, cache_layer_id=0, cache_block_id=0,
    output_type='pil'
).images[0]

Run Stable Diffusion 1.5 with OneDiffX

import torch

from onediffx import compile_pipe
from onediffx.deep_cache import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
pipe.to("cuda")

pipe = compile_pipe(pipe)

prompt = "a photo of an astronaut on a moon"
# Warmup
for i in range(1):
    deepcache_output = pipe(
        prompt, 
        cache_interval=3, cache_layer_id=0, cache_block_id=0,
        output_type='pil'
    ).images[0]

deepcache_output = pipe(
    prompt, 
    cache_interval=3, cache_layer_id=0, cache_block_id=0,
    output_type='pil'
).images[0]

Run Stable Video Diffusion with OneDiffX

import torch

from diffusers.utils import load_image, export_to_video
from onediffx import compile_pipe, compiler_config
from onediffx.deep_cache import StableVideoDiffusionPipeline

pipe = StableVideoDiffusionPipeline.from_pretrained(
    "stabilityai/stable-video-diffusion-img2vid-xt",
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
pipe.to("cuda")

compiler_config.attention_allow_half_precision_score_accumulation_max_m = 0
pipe = compile_pipe(pipe)

input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true")
input_image = input_image.resize((1024, 576))

# Warmup
for i in range(1):
    deepcache_output = pipe(
        input_image, 
        decode_chunk_size=5,
        cache_interval=3, cache_branch=0,
    ).frames[0]

deepcache_output = pipe(
    input_image, 
    decode_chunk_size=5,
    cache_interval=3, cache_branch=0,
).frames[0]

export_to_video(deepcache_output, "generated.mp4", fps=7)

Fast LoRA loading and switching

OneDiff provides a more efficient implementation of loading LoRA, by invoking load_and_fuse_lora you can load and fuse LoRA to pipeline, and by invoking unfuse_lora you can restore the weight of base model.

API

onediffx.lora.load_and_fuse_lora(pipeline: LoraLoaderMixin, pretrained_model_name_or_path_or_dict: Union[str, Path, Dict[str, torch.Tensor]], adapter_name: Optional[str] = None, *, lora_scale: float = 1.0, offload_device="cpu", offload_weight="lora", use_cache=False, **kwargs):

  • pipeline (LoraLoaderMixin): The pipeline that will load and fuse LoRA weight.

  • pretrained_model_name_or_path_or_dict (str or os.PathLike or dict): Can be either:

    • A string, the model id (for example google/ddpm-celebahq-256) of a pretrained model hosted on the Hub.

    • A path to a directory containing the model weights saved with ModelMixin.save_pretrained().

    • torch state dict.

  • adapter_name(stroptional): Adapter name to be used for referencing the loaded adapter model. If not specified, it will use default_{i} where i is the total number of adapters being loaded. Not supported now.

  • lora_scale (float, defaults to 1.0): Controls how much to influence the outputs with the LoRA parameters.

  • offload_device (str, must be one of "cpu" and "cuda"): The device to offload the weight of LoRA or model

  • offload_weight (str, must be one of "lora" and "weight"): The weight type to offload. If set to "lora", the weight of LoRA will be offloaded to offload_device, and if set to "weight", the weight of Linear or Conv2d will be offloaded.

  • use_cache (bool, optional): Whether to save LoRA to cache. If set to True, loaded LoRA will be cached in memory.

  • kwargs(dictoptional) — See lora_state_dict()

onediffx.lora.unfuse_lora(pipeline: LoraLoaderMixin) -> None:

  • pipeline (LoraLoaderMixin): The pipeline that will unfuse LoRA weight.

Example

import torch
from diffusers import DiffusionPipeline
from onediffx import compile_pipe
from onediffx.lora import load_and_fuse_lora, unfuse_lora

MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(MODEL_ID, variant="fp16", torch_dtype=torch.float16).to("cuda")

LORA_MODEL_ID = "hf-internal-testing/sdxl-1.0-lora"
LORA_FILENAME = "sd_xl_offset_example-lora_1.0.safetensors"

pipe = compile_pipe(pipe)

# use onediff load_and_fuse_lora
load_and_fuse_lora(pipe, LORA_MODEL_ID, weight_name=LORA_FILENAME, lora_scale=1.0)
images_fusion = pipe(
    "masterpiece, best quality, mountain",
    height=1024,
    width=1024,
    num_inference_steps=30,
).images[0]
images_fusion.save("test_sdxl_lora.png")

# before loading another LoRA, you need to
# unload LoRA weights and restore base model
unfuse_lora(pipe)
load_and_fuse_lora(pipe, LORA_MODEL_ID, weight_name=LORA_FILENAME, lora_scale=1.0)

Benchmark

We choose 5 LoRAs to profile loading and switching speed of 3 different APIs

  1. load_lora_weight, which has high loading performance but low inference performance

  2. load_lora_weight + fuse_lora, which has high inference performance but low loading performance

  3. onediffx.lora.load_and_fuse_lora, which has high loading performance and high inference performance

The results are shown below

LoRA name size load_lora_weight load_lora_weight + fuse_lora onediffx load_and_fuse_lora src link
SDXL-Emoji-Lora-r4.safetensors 28M 1.69 s 2.34 s 0.78 s Link
sdxl_metal_lora.safetensors 23M 0.97 s 1.73 s 0.19 s
simple_drawing_xl_b1-000012.safetensors 55M 1.67 s 2.57 s 0.77 s Link
texta.safetensors 270M 1.72 s 2.86 s 0.97 s Link
watercolor_v1_sdxl_lora.safetensors 12M 1.54 s 2.01 s 0.35 s

Note

  1. OneDiff extensions for LoRA is currently not supported for PEFT, and only supports diffusers of at least version 0.21.0.

  2. Diffusers (without PEFT) are limited to loading only one LoRA. Consequently, onediffx is also restricted to loading a single LoRA. We are currently developing onediffx that are compatible with PEFT, enabling onediffx to load multiple LoRAs.

Quantization

Note: Quantization feature is only supported by OneDiff Enterprise.

OneDiff Enterprise offers a quantization method that reduces memory usage, increases speed, and maintains quality without any loss.

If you possess a OneDiff Enterprise license key, you can access instructions on OneDiff quantization and related models by visiting Hugginface/siliconflow. Alternatively, you can contact us to inquire about purchasing the OneDiff Enterprise license.

Contact

For users of OneDiff Community, please visit GitHub Issues for bug reports and feature requests.

For users of OneDiff Enterprise, you can contact contact@siliconflow.com for commercial support.

Feel free to join our Discord community for discussions and to receive the latest updates.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

onediffx-0.12.1.dev202403010149.tar.gz (74.2 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file onediffx-0.12.1.dev202403010149.tar.gz.

File metadata

File hashes

Hashes for onediffx-0.12.1.dev202403010149.tar.gz
Algorithm Hash digest
SHA256 8b3dd1e3276de4f949739663ef86dd375bbd34cf60e9523df786960970c0906a
MD5 adca40994f150a9357c35ff6314278b6
BLAKE2b-256 4ad9d5190b7a6c1e2d93acd896958025b1f54aacbd7c852d5b1f6a8ebb356a18

See more details on using hashes here.

File details

Details for the file onediffx-0.12.1.dev202403010149-py3-none-any.whl.

File metadata

File hashes

Hashes for onediffx-0.12.1.dev202403010149-py3-none-any.whl
Algorithm Hash digest
SHA256 77cc332451bfea432114f11313ef0c44289c42fcb38cb0f0d632865f432c4ffa
MD5 27d9468ca2c066b498e56f76e2d263ad
BLAKE2b-256 51253f2540c6c33ecc08bfb732d277707d7056b5376bb96fa0a8ab6044c13e29

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page