AI Model Dynamic Offloader for ComfyUI
Project description
AI Model Dynamic Offloader
This project is a pytorch VRAM allocator that implements on-demand offloading of model weights when the primary pytorch VRAM allocator comes under pressure.
Support:
- Nvidia GPUs only
- Pytorch 2.6+
- Cuda 12.8+
- Windows 11+ / Linux as per python ManyLinux support
How it works:
- The pytorch application creates a Virtual Base Address Register (VBAR) for a model. Creating a VBAR doesn't cost any VRAM, only GPU virtual address space (which is pretty much free).
- The pytorch application allocates tensors for model weights within the VBAR. These tensors are initially un-allocated and will segfault if touched.
- The pytorch application faults in the tensors using the
fault()API at the time the tensor is needed. This is where VRAM actually gets allocated.
If the fault() is successful (sufficient VRAM for this tensor):
- If the fault() resultant signature is changed or unknown:
- The application uses
tensor::_copy()to populate the weight data on the GPU. - The application saves the returned signature against this weight for future comparison
- The application uses
- The layer uses the weight tensor.
- The application calls
unpin()on the tensor to allow it to be freed under pressure later if needed.
If the fault() is unsuccessful (offloaded weight):
- The application allocates a temporary regular GPU tensor.
- Uses
_copyto populate weight data on the GPU. - The layer uses the temporary as the weight.
- Pytorch garbage collects the temp when the layer is finished.
see examples/example.py
Priorities:
- The most recent VBARs are the highest priority and lower addresses in the VBAR take priority over higher addresses.
- Applications should order their tensor allocations in the VBAR in load-priority order with the lowest addresses for the highest priority weights.
- Calling
fault()on a weight that is higher priority than other weights will cause those lower priority weights to get freed to make space. - Having a weight evicted sets that VBAR's watermark to that weight's level. Any weights in the same VBAR above the watermark automatically fail the
fault()API. This avoids constantly faulting in all weights each model iteration while allowing the application to just blindly callfault()every layer and check the results. There is no need for the application to manage any VRAM quotas or watermarks. - Existing VBARs can be pushed to top priority with the
prioritize()API. This allows use of an already loaded or partially model (e.g. using the same model twice in a complex workflow). Usingprioritizeresets the offload watermark of that model to no offloading, giving its weights priority over any other currently loaded models.
Backend:
- VBAR allocation is done with
cuMemAddressReserve(), faulting withcuMemCreate()andcuMemMap()and all frees done with appropriate converse APIs. - For consistency with VBAR memory management, main pytorch allocator plugin is also implemented with
cuMemAddressReserve->cuMemCreate->cuMemMap. This also behaves a lot better on Windows systems with System Memory fallback. - This allocator is incompatible with the pytorch
cudaMallocAsyncbackend or expandable segments backends (as the plugin interface does not exist on these backends as of this writing).
Caveats:
- There is no real way for this allocator to tell the difference between high usage and bad fragmentation in the pytorch caching allocator. As we always return success to the pytorch caching allocator it experiences no pressure while weights are being offloaded which means it can run in an extremely fragmented mode. The assumption is model weight access patterns are reasonably regular over blocks or iterations and it finds a good set of sizes to cache. What you should generally do though, is completely flush the pytorch caching allocator before each new model run, which avoids completely un-used reservations from taking priority over the next models weights.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file comfy_aimdo-0.1.8-py3-none-any.whl.
File metadata
- Download URL: comfy_aimdo-0.1.8-py3-none-any.whl
- Upload date:
- Size: 18.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
055b37b037ad11291ba87da49ef278c1ebb0e22c742111faf9ae315b3aeedd99
|
|
| MD5 |
fd987c1314a926a3cc3b96859847c770
|
|
| BLAKE2b-256 |
065947e8f1a513d5e4c041edf8afb164e7fbd42635e78974cbfd609fcda3506e
|
Provenance
The following attestation bundles were made for comfy_aimdo-0.1.8-py3-none-any.whl:
Publisher:
build-wheels.yml on Comfy-Org/comfy-aimdo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
comfy_aimdo-0.1.8-py3-none-any.whl -
Subject digest:
055b37b037ad11291ba87da49ef278c1ebb0e22c742111faf9ae315b3aeedd99 - Sigstore transparency entry: 929807835
- Sigstore integration time:
-
Permalink:
Comfy-Org/comfy-aimdo@a651635bb87d1f2d90f323d975c3512ec99b8705 -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/Comfy-Org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@a651635bb87d1f2d90f323d975c3512ec99b8705 -
Trigger Event:
push
-
Statement type:
File details
Details for the file comfy_aimdo-0.1.8-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: comfy_aimdo-0.1.8-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 109.7 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d9b29ca5b43803b2edf3a3793294e3e364cedcd0e229c2e06bea738bcd50ce7
|
|
| MD5 |
e2a54fff1b9378e1f90dc69780c38ab3
|
|
| BLAKE2b-256 |
5890778b16c1c04f447307b7b048e7fd153fdff458f57358c1061514fc13e518
|
Provenance
The following attestation bundles were made for comfy_aimdo-0.1.8-cp39-abi3-win_amd64.whl:
Publisher:
build-wheels.yml on Comfy-Org/comfy-aimdo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
comfy_aimdo-0.1.8-cp39-abi3-win_amd64.whl -
Subject digest:
4d9b29ca5b43803b2edf3a3793294e3e364cedcd0e229c2e06bea738bcd50ce7 - Sigstore transparency entry: 929807833
- Sigstore integration time:
-
Permalink:
Comfy-Org/comfy-aimdo@a651635bb87d1f2d90f323d975c3512ec99b8705 -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/Comfy-Org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@a651635bb87d1f2d90f323d975c3512ec99b8705 -
Trigger Event:
push
-
Statement type:
File details
Details for the file comfy_aimdo-0.1.8-cp39-abi3-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl.
File metadata
- Download URL: comfy_aimdo-0.1.8-cp39-abi3-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl
- Upload date:
- Size: 65.2 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64, manylinux: glibc 2.5+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
700ecf7ea6f716168ca90072366e43f80134870beaf93e7f34493fc57a011df3
|
|
| MD5 |
de7475d584a87aefa53545ec403ee6fa
|
|
| BLAKE2b-256 |
f34940f56649b4fd483fe5433e7dbf11cc35b8cd89d50954ee402bb7168952c9
|
Provenance
The following attestation bundles were made for comfy_aimdo-0.1.8-cp39-abi3-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl:
Publisher:
build-wheels.yml on Comfy-Org/comfy-aimdo
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
comfy_aimdo-0.1.8-cp39-abi3-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl -
Subject digest:
700ecf7ea6f716168ca90072366e43f80134870beaf93e7f34493fc57a011df3 - Sigstore transparency entry: 929807831
- Sigstore integration time:
-
Permalink:
Comfy-Org/comfy-aimdo@a651635bb87d1f2d90f323d975c3512ec99b8705 -
Branch / Tag:
refs/tags/v0.1.8 - Owner: https://github.com/Comfy-Org
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build-wheels.yml@a651635bb87d1f2d90f323d975c3512ec99b8705 -
Trigger Event:
push
-
Statement type: