onediff extensions for diffusers
Project description
OneDiffX (for HF diffusers)
OneDiffX is a OneDiff Extension for HF diffusers. It provides some acceleration utilities, such as DeepCache.
- Install and Setup
- Compile, save and load pipeline
- Acceleration for state-of-the-art Models
- DeepCache Speedup
- Fast LoRA loading and switching
- Quantization
- Contact
Install and setup
-
Follow the steps here to install onediff.
-
Install onediffx by following these steps
git clone https://github.com/siliconflow/onediff.git cd onediff_diffusers_extensions && python3 -m pip install -e .
Compile, save and load pipeline
The complete example to test compile/save/load the pipeline: pipe_compile_save_load.py.
Compile diffusers pipeline with compile_pipe
.
import torch
from diffusers import StableDiffusionXLPipeline
from onediffx import compile_pipe
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe.to("cuda")
pipe = compile_pipe(pipe)
# run once to trigger compilation
image = pipe(
prompt="street style, detailed, raw photo, woman, face, shot on CineStill 800T",
height=512,
width=512,
num_inference_steps=30,
output_type="pil",
).images
image[0].save(f"test_image.png")
Save compiled pipeline with save_pipe
from diffusers import StableDiffusionXLPipeline
from onediffx import compile_pipe, save_pipe
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe.to("cuda")
pipe = compile_pipe(pipe)
# run once to trigger compilation
image = pipe(
prompt="street style, detailed, raw photo, woman, face, shot on CineStill 800T",
height=512,
width=512,
num_inference_steps=30,
output_type="pil",
).images
image[0].save(f"test_image.png")
# save the compiled pipe
save_pipe(pipe, dir="cached_pipe")
Load compiled pipeline with load_pipe
from diffusers import StableDiffusionXLPipeline
from onediffx import compile_pipe, load_pipe
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe.to("cuda")
pipe = compile_pipe(pipe)
# load the compiled pipe
load_pipe(pipe, dir="cached_pipe")
# no compilation now
image = pipe(
prompt="street style, detailed, raw photo, woman, face, shot on CineStill 800T",
height=512,
width=512,
num_inference_steps=30,
output_type="pil",
).images
image[0].save(f"test_image.png")
DeepCache speedup
Run Stable Diffusion XL with OneDiffX
import torch
from onediffx import compile_pipe
from onediffx.deep_cache import StableDiffusionXLPipeline
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe.to("cuda")
pipe = compile_pipe(pipe)
prompt = "A photo of a cat. Focus light and create sharp, defined edges."
# Warmup
for i in range(1):
deepcache_output = pipe(
prompt,
cache_interval=3, cache_layer_id=0, cache_block_id=0,
output_type='pil'
).images[0]
deepcache_output = pipe(
prompt,
cache_interval=3, cache_layer_id=0, cache_block_id=0,
output_type='pil'
).images[0]
Run Stable Diffusion 1.5 with OneDiffX
import torch
from onediffx import compile_pipe
from onediffx.deep_cache import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe.to("cuda")
pipe = compile_pipe(pipe)
prompt = "a photo of an astronaut on a moon"
# Warmup
for i in range(1):
deepcache_output = pipe(
prompt,
cache_interval=3, cache_layer_id=0, cache_block_id=0,
output_type='pil'
).images[0]
deepcache_output = pipe(
prompt,
cache_interval=3, cache_layer_id=0, cache_block_id=0,
output_type='pil'
).images[0]
Run Stable Video Diffusion with OneDiffX
import torch
from diffusers.utils import load_image, export_to_video
from onediffx import compile_pipe, compiler_config
from onediffx.deep_cache import StableVideoDiffusionPipeline
pipe = StableVideoDiffusionPipeline.from_pretrained(
"stabilityai/stable-video-diffusion-img2vid-xt",
torch_dtype=torch.float16,
variant="fp16",
use_safetensors=True
)
pipe.to("cuda")
compiler_config.attention_allow_half_precision_score_accumulation_max_m = 0
pipe = compile_pipe(pipe)
input_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/svd/rocket.png?download=true")
input_image = input_image.resize((1024, 576))
# Warmup
for i in range(1):
deepcache_output = pipe(
input_image,
decode_chunk_size=5,
cache_interval=3, cache_branch=0,
).frames[0]
deepcache_output = pipe(
input_image,
decode_chunk_size=5,
cache_interval=3, cache_branch=0,
).frames[0]
export_to_video(deepcache_output, "generated.mp4", fps=7)
Fast LoRA loading and switching
OneDiff provides a more efficient implementation of loading LoRA, by invoking load_and_fuse_lora
you can load and fuse LoRA to pipeline, and by invoking unfuse_lora
you can restore the weight of base model.
API
onediffx.lora.load_and_fuse_lora
onediffx.lora.load_and_fuse_lora(pipeline: LoraLoaderMixin, pretrained_model_name_or_path_or_dict: Union[str, Path, Dict[str, torch.Tensor]], adapter_name: Optional[str] = None, *, lora_scale: float = 1.0, offload_device="cpu", offload_weight="lora", use_cache=False, **kwargs)
:
-
pipeline (
LoraLoaderMixin
): The pipeline that will load and fuse LoRA weight. -
pretrained_model_name_or_path_or_dict (
str
oros.PathLike
ordict
): Can be either:-
A string, the model id (for example
google/ddpm-celebahq-256
) of a pretrained model hosted on the Hub. -
A path to a directory containing the model weights saved with ModelMixin.save_pretrained().
-
-
adapter_name(
str
, optional): Adapter name to be used for referencing the loaded adapter model. If not specified, it will usedefault_{i}
where i is the total number of adapters being loaded. Not supported now. -
lora_scale (
float
, defaults to 1.0): Controls how much to influence the outputs with the LoRA parameters. -
offload_device (
str
, must be one of "cpu" and "cuda"): The device to offload the weight of LoRA or model -
offload_weight (
str
, must be one of "lora" and "weight"): The weight type to offload. If set to "lora", the weight of LoRA will be offloaded tooffload_device
, and if set to "weight", the weight of Linear or Conv2d will be offloaded. -
use_cache (
bool
, optional): Whether to save LoRA to cache. If set to True, loaded LoRA will be cached in memory. -
kwargs(
dict
, optional) — See lora_state_dict()
onediffx.lora.unfuse_lora
onediffx.lora.unfuse_lora(pipeline: LoraLoaderMixin) -> None
:
- pipeline (
LoraLoaderMixin
): The pipeline that will unfuse LoRA weight.
onediffx.lora.set_and_fuse_adapters
onediffx.lora.set_and_fuse_adapters(pipeline: LoraLoaderMixin, adapter_names: Union[List[str], str], adapter_weights: Optional[List[float]] = None)
Set the LoRA layers of adapter_name
for the unet and text-encoder(s) with related adapter_weights
.
- pipeline (
LoraLoaderMixin
): The pipeline that will set adapters. - adapter_names(
str
orList[str]
): The adapter name(s) of LoRA(s) to be set for the pipeline, must appear in theadapter_name
parameter of theload_and_fuse_lora
function, otherwise it will be ignored. - adapter_weights(
float
orList[float]
, optional): The weight(s) of adapter(s), if is None, it will be set to 1.0.
`onediffx.lora.delete_adapters``
onediffx.lora.delete_adapters(pipeline: LoraLoaderMixin, adapter_names: Union[List[str], str])
Deletes the LoRA layers of adapter_name
for the unet and text-encoder(s).
- adapter_names (
str
orList[str]
): The names of the adapter to delete. Can be a single string or a list of strings
Example
import torch
from diffusers import DiffusionPipeline
from onediffx import compile_pipe
from onediffx.lora import load_and_fuse_lora, set_and_fuse_adapters, delete_adapters
MODEL_ID = "stabilityai/stable-diffusion-xl-base-1.0"
pipe = DiffusionPipeline.from_pretrained(MODEL_ID, variant="fp16", torch_dtype=torch.float16).to("cuda")
pipe = compile_pipe(pipe)
# use onediff load_and_fuse_lora
LORA_MODEL_ID = "Norod78/SDXL-YarnArtStyle-LoRA"
LORA_FILENAME = "SDXL_Yarn_Art_Style.safetensors"
load_and_fuse_lora(pipe, LORA_MODEL_ID, weight_name=LORA_FILENAME, lora_scale=1.0, adapter_name="SDXL_Yarn_Art_Style")
images_fusion = pipe(
"a cat",
height=1024,
width=1024,
generator=torch.manual_seed(0),
num_inference_steps=30,
).images[0]
images_fusion.save("test_sdxl_lora_SDXL_Yarn_Art_Style.png")
# load another LoRA, now the pipe has two LoRA models
LORA_MODEL_ID = "ostris/watercolor_style_lora_sdxl"
LORA_FILENAME = "watercolor_v1_sdxl.safetensors"
load_and_fuse_lora(pipe, LORA_MODEL_ID, weight_name=LORA_FILENAME, lora_scale=1.0, adapter_name="watercolor")
images_fusion = pipe(
"a cat",
height=1024,
width=1024,
generator=torch.manual_seed(0),
num_inference_steps=30,
).images[0]
images_fusion.save("test_sdxl_lora_SDXL_Yarn_Art_Style_watercolor.png")
# set LoRA 'SDXL_Yarn_Art_Style' with strength = 0.5, now the pipe has only LoRA 'SDXL_Yarn_Art_Style' with strength = 0.5
set_and_fuse_adapters(pipe, adapter_names="SDXL_Yarn_Art_Style", adapter_weights=0.5)
images_fusion = pipe(
"a cat",
height=1024,
width=1024,
generator=torch.manual_seed(0),
num_inference_steps=30,
).images[0]
images_fusion.save("test_sdxl_lora_SDXL_Yarn_Art_Style_05.png")
# set LoRA 'SDXL_Yarn_Art_Style' with strength = 0.8 and watercolor with strength = 0.2, now the pipe has 2 LoRAs
set_and_fuse_adapters(pipe, adapter_names=["SDXL_Yarn_Art_Style", "watercolor"], adapter_weights=[0.8, 0.2])
images_fusion = pipe(
"a cat",
height=1024,
width=1024,
generator=torch.manual_seed(0),
num_inference_steps=30,
).images[0]
images_fusion.save("test_sdxl_lora_SDXL_Yarn_Art_Style_08_watercolor_02.png")
# delete lora 'SDXL_Yarn_Art_Style', now pipe has only 'watercolor' with strength = 0.8 left
delete_adapters(pipe, "SDXL_Yarn_Art_Style")
images_fusion = pipe(
"a cat",
height=1024,
width=1024,
generator=torch.manual_seed(0),
num_inference_steps=30,
).images[0]
images_fusion.save("test_sdxl_lora_watercolor_02.png")
Benchmark
We choose 5 LoRAs to profile loading speed of 3 different APIs and switching speed of 2 different APIs, and test with and without using the PEFT backend separately. The results are shown below.
LoRA loading
-
load_lora_weight
, which has high loading performance but low inference performance -
load_lora_weight + fuse_lora
, which has high inference performance but low loading performance -
onediffx.lora.load_and_fuse_lora
, which has high loading performance and high inference performance
Without PEFT backend
LoRA name | size | HF load_lora_weight | HF load_lora_weight + fuse_lora | OneDiffX load_and_fuse_lora | src link |
---|---|---|---|---|---|
SDXL-Emoji-Lora-r4 | 28M | 1.69 s | 2.34 s | 0.78 s | Link |
sdxl_metal_lora | 23M | 0.97 s | 1.73 s | 0.19 s | |
simple_drawing_xl_b1-000012 | 55M | 1.67 s | 2.57 s | 0.77 s | Link |
texta | 270M | 1.72 s | 2.86 s | 0.97 s | Link |
watercolor_v1_sdxl_lora | 12M | 1.54 s | 2.01 s | 0.35 s |
With PEFT backend
LoRA name | size | HF load_lora_weights | HF load_lora_weights + fuse_lora | OneDiffX load_and_fuse_lora | src link |
---|---|---|---|---|---|
SDXL-Emoji-Lora-r4 | 28M | 5.25 s | 6.21 s | 0.78 s | Link |
sdxl_metal_lora | 23M | 2.44 s | 3.80 s | 0.24 s | |
simple_drawing_xl_b1-000012 | 55M | 4.09 s | 5.79 s | 0.81 s | Link |
texta | 270M | 109.13 s | 109.71 s | 1.07 s | Link |
watercolor_v1_sdxl_lora | 12M | 3.08 s | 4.04 s | 0.40 s |
LoRA switching
We tested the performance of set_adapters
, still using the five LoRA models mentioned above. The numbers 1-5 represent the five models 'SDXL-Emoji-Lora-r4', 'sdxl_metal_lora', 'simple_drawing_xl_b1-000012', 'texta', 'watercolor_v1_sdxl_lora'.
-
PEFT
set_adapters + fuse_lora
-
OneDiffX
set_and_fuse_adapters
, which has the same effect as PEFTset_adapters + fuse_lora
LoRA names | PEFT set_adapters + fuse_lora | OneDiffX set_and_fuse_adapters |
---|---|---|
[1] | 0.47 s | 0.28 s |
[1, 2] | 0.52 s | 0.34 s |
[1, 2, 3] | 0.71 s | 0.55 s |
[1, 2, 3, 4] | 2.02 s | 0.73 s |
[1, 2, 3, 4, 5] | 1.00 s | 0.80 s |
Note
-
OneDiff extensions for LoRA is currently only supported for limited PEFT APIs, and only supports diffusers of at least version 0.21.0.
-
If your LoRA model only contains the weights of the Linear module, you can directly use OneDiffX without any modifications. But if your LoRA model includes the weights of the Conv module (such as LyCORIS), you need to disable constant folding optimization by above methods (which may cause a performance drop of around 4.4%), otherwise the weights of the Conv module may not be loaded into the model.
- Set the env var
ONEFLOW_MLIR_ENABLE_INFERENCE_OPTIMIZATION
to 0 - Set compiler_config.mlir_enable_inference_optimization to 0 before invoking
oneflow_compile
as the code belowfrom onediffx import compiler_config compiler_config.mlir_enable_inference_optimization = 0 ... pipe.unet = oneflow_compile(pipe.unet) ...
- Set the env var
Optimization
-
When not using the PEFT backend, diffusers will replace the module corresponding to LoRA with the LoRACompatible module, incurring additional parameter initialization time overhead. In OneDiffX, the LoRA parameters are directly fused into the model, bypassing the step of replacing the module, thereby reducing the time overhead.
-
When using the PEFT backend, PEFT will also replace the module corresponding to LoRA with the corresponding BaseTunerLayer. Similar to diffusers, this increases the time overhead. OneDiffX also bypasses this step by directly operating on the original model.
-
While traversing the submodules of the model, we observed that the
getattr
time overhead of OneDiff'sDeployableModule
is high. Because the parameters of DeployableModule share the same address as the PyTorch module it wraps, we choose to traverseDeployableModule._torch_module
, greatly improving traversal efficiency.
Quantization
Note: Quantization feature is only supported by OneDiff Enterprise.
OneDiff Enterprise offers a quantization method that reduces memory usage, increases speed, and maintains quality without any loss.
If you possess a OneDiff Enterprise license key, you can access instructions on OneDiff quantization and related models by visiting Hugginface/siliconflow. Alternatively, you can contact us to inquire about purchasing the OneDiff Enterprise license.
Contact
For users of OneDiff Community, please visit GitHub Issues for bug reports and feature requests.
For users of OneDiff Enterprise, you can contact contact@siliconflow.com for commercial support.
Feel free to join our Discord community for discussions and to receive the latest updates.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file onediffx-0.13.0.dev202403200123.tar.gz
.
File metadata
- Download URL: onediffx-0.13.0.dev202403200123.tar.gz
- Upload date:
- Size: 57.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b69d7a544dc87a19d3f470a7f99ec0222a940dfdba44027716c323c98dd27c67 |
|
MD5 | 8e9d60980bc1bdf1ab53c821aa4de6f3 |
|
BLAKE2b-256 | c6a23f06918822204d6c56ed159d07a0a00eead18c0474010302fcdcefda40f8 |
File details
Details for the file onediffx-0.13.0.dev202403200123-py3-none-any.whl
.
File metadata
- Download URL: onediffx-0.13.0.dev202403200123-py3-none-any.whl
- Upload date:
- Size: 63.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e068dd2c777cc3efbc82554a8cfacd57a7fd12953409f347f7da4535d7c2171a |
|
MD5 | c52fbb531ddc5c34c0e02c7fae9554bb |
|
BLAKE2b-256 | f61545ba48f21eb99fc12439ec632643f38e050b2a22f425400892e8414a628a |