Skip to main content

AI Model Dynamic Offloader for ComfyUI

Project description

AI Model Dynamic Offloader

This project is a pytorch VRAM allocator that implements on-demand offloading of model weights when the primary pytorch VRAM allocator comes under pressure.

Support:

  • Nvidia GPUs only
  • Pytorch 2.8+
  • Cuda 12.8+
  • Windows 11+ / Linux as per python ManyLinux support

How it works:

  • The pytorch application creates a Virtual Base Address Register (VBAR) for a model. Creating a VBAR doesn't cost any VRAM, only GPU virtual address space (which is pretty much free).
  • The pytorch application allocates tensors for model weights within the VBAR. These tensors are initially un-allocated and will segfault if touched.
  • The pytorch application faults in the tensors using the fault() API at the time the tensor is needed. This is where VRAM actually gets allocated.
If the fault() is successful (sufficient VRAM for this tensor):
  1. If the fault() resultant signature is changed or unknown:
    • The application uses tensor::_copy() to populate the weight data on the GPU.
    • The application saves the returned signature against this weight for future comparison
  2. The layer uses the weight tensor.
  3. The application calls unpin() on the tensor to allow it to be freed under pressure later if needed.
If the fault() is unsuccessful (offloaded weight):
  1. The application allocates a temporary regular GPU tensor.
  2. Uses _copy to populate weight data on the GPU.
  3. The layer uses the temporary as the weight.
  4. Pytorch garbage collects the temp when the layer is finished.

see examples/example.py


Priorities:

  • The most recent VBARs are the highest priority and lower addresses in the VBAR take priority over higher addresses.
  • Applications should order their tensor allocations in the VBAR in load-priority order with the lowest addresses for the highest priority weights.
  • Calling fault() on a weight that is higher priority than other weights will cause those lower priority weights to get freed to make space.
  • Having a weight evicted sets that VBAR's watermark to that weight's level. Any weights in the same VBAR above the watermark automatically fail the fault() API. This avoids constantly faulting in all weights each model iteration while allowing the application to just blindly call fault() every layer and check the results. There is no need for the application to manage any VRAM quotas or watermarks.
  • Existing VBARs can be pushed to top priority with the prioritize() API. This allows use of an already loaded or partially model (e.g. using the same model twice in a complex workflow). Using prioritize resets the offload watermark of that model to no offloading, giving its weights priority over any other currently loaded models.

Backend:

  • VBAR allocation is done with cuMemAddressReserve(), faulting with cuMemCreate() and cuMemMap() and all frees done with appropriate converse APIs.
  • For consistency with VBAR memory management, main pytorch allocator plugin is also implemented with cuMemAddressReserve -> cuMemCreate -> cuMemMap. This also behaves a lot better on Windows systems with System Memory fallback.

Caveats:

  • There is no real way for this allocator to tell the difference between high usage and bad fragmentation in the pytorch caching allocator. As we always return success to the pytorch caching allocator it experiences no pressure while weights are being offloaded which means it can run in an extremely fragmented mode. The assumption is model weight access patterns are reasonably regular over blocks or iterations and it finds a good set of sizes to cache. What you should generally do though, is completely flush the pytorch caching allocator before each new model run, which avoids completely un-used reservations from taking priority over the next models weights.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

comfy_aimdo-0.4.7-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

comfy_aimdo-0.4.7-cp39-abi3-win_arm64.whl (224.9 kB view details)

Uploaded CPython 3.9+Windows ARM64

comfy_aimdo-0.4.7-cp39-abi3-win_amd64.whl (255.0 kB view details)

Uploaded CPython 3.9+Windows x86-64

comfy_aimdo-0.4.7-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl (564.2 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.17+ ARM64

comfy_aimdo-0.4.7-cp39-abi3-manylinux2010_x86_64.manylinux2014_x86_64.manylinux_2_12_x86_64.manylinux_2_17_x86_64.whl (339.8 kB view details)

Uploaded CPython 3.9+manylinux: glibc 2.12+ x86-64manylinux: glibc 2.17+ x86-64

File details

Details for the file comfy_aimdo-0.4.7-py3-none-any.whl.

File metadata

  • Download URL: comfy_aimdo-0.4.7-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_aimdo-0.4.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b265e8f40943c74cf52ba2f78c2d79297663c0d834b18e8df278e2e7cfc14ea1
MD5 1c9650129fe1d91fbe6bd2d1867b7672
BLAKE2b-256 cc5145cc0c8b5c4b40e00280f0631978ea26fb83061079ef84b4d5fa72f17360

See more details on using hashes here.

Provenance

The following attestation bundles were made for comfy_aimdo-0.4.7-py3-none-any.whl:

Publisher: build-wheels.yml on Comfy-Org/comfy-aimdo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file comfy_aimdo-0.4.7-cp39-abi3-win_arm64.whl.

File metadata

  • Download URL: comfy_aimdo-0.4.7-cp39-abi3-win_arm64.whl
  • Upload date:
  • Size: 224.9 kB
  • Tags: CPython 3.9+, Windows ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_aimdo-0.4.7-cp39-abi3-win_arm64.whl
Algorithm Hash digest
SHA256 67fbbc9e0af4843befc3cdd1f3d00d0a951857bff4faaa9ec887cf0f6ed9d4b4
MD5 ea97aefb3b0b64116fc61575b64b43b8
BLAKE2b-256 0038e193c38e02f2e42f6d79afc5f00029c29d949af8080d3a0930c0201bcfa8

See more details on using hashes here.

Provenance

The following attestation bundles were made for comfy_aimdo-0.4.7-cp39-abi3-win_arm64.whl:

Publisher: build-wheels.yml on Comfy-Org/comfy-aimdo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file comfy_aimdo-0.4.7-cp39-abi3-win_amd64.whl.

File metadata

  • Download URL: comfy_aimdo-0.4.7-cp39-abi3-win_amd64.whl
  • Upload date:
  • Size: 255.0 kB
  • Tags: CPython 3.9+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for comfy_aimdo-0.4.7-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d2055975ebf11dbf231ad48ae5c76bb4f098a5f866ccc0e1c3ea304dbfe99179
MD5 e5989126b5e7548171f67b751edb2147
BLAKE2b-256 49d81aa69a321e515ea2a36e821e01b6d4a54242a7952f60556d3f8dfa2c1268

See more details on using hashes here.

Provenance

The following attestation bundles were made for comfy_aimdo-0.4.7-cp39-abi3-win_amd64.whl:

Publisher: build-wheels.yml on Comfy-Org/comfy-aimdo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file comfy_aimdo-0.4.7-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl.

File metadata

File hashes

Hashes for comfy_aimdo-0.4.7-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl
Algorithm Hash digest
SHA256 1b0c4c5e328582f07e5d563e7307452e3c13d75ff9d44facf18bfc68badf15d9
MD5 7fffb9442985c040c9c18adf08a7f113
BLAKE2b-256 6e341b92796a3d2495765686698ef4684e1b55564fb6131e62bce67bdec253db

See more details on using hashes here.

Provenance

The following attestation bundles were made for comfy_aimdo-0.4.7-cp39-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl:

Publisher: build-wheels.yml on Comfy-Org/comfy-aimdo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file comfy_aimdo-0.4.7-cp39-abi3-manylinux2010_x86_64.manylinux2014_x86_64.manylinux_2_12_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for comfy_aimdo-0.4.7-cp39-abi3-manylinux2010_x86_64.manylinux2014_x86_64.manylinux_2_12_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 7c1b5694209751b85a313c714d80b514cd7663fed112f192b2772a21eed204a9
MD5 f3024e9015ef4631105f485f8991ba99
BLAKE2b-256 a6ecbb2779b223e3495f768c53f645aaf131b601f0d6bc927c3a5d380afccb0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for comfy_aimdo-0.4.7-cp39-abi3-manylinux2010_x86_64.manylinux2014_x86_64.manylinux_2_12_x86_64.manylinux_2_17_x86_64.whl:

Publisher: build-wheels.yml on Comfy-Org/comfy-aimdo

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page