PyTorch helpers for cjm-substrate capabilities: GPU memory release, typed CUDA-OOM handling, and device selection.

These details have not been verified by PyPI

Project links

Project description

cjm-substrate-torch-utils

Install

pip install cjm_substrate_torch_utils

Project Structure

nbs/
├── device.ipynb # Resolve a device spec ("auto" / "cpu" / "cuda" / "cuda:N") to a concrete torch device string.
├── memory.ipynb # Robust move-to-CPU + drop-references + gc + CUDA-cache cleanup for releasing models, factored out of the per-capability reimplementations.
└── oom.ipynb    # Convert torch CUDA out-of-memory exceptions into the substrate's typed `CapabilityResourceError` (SG-47 Track B) so CR-7 reactive retry can evict and reload.

Total: 3 notebooks

Module Dependencies

graph LR
    device["device<br/>Device resolution"]
    memory["memory<br/>GPU model release"]
    oom["oom<br/>CUDA OOM handling"]

No cross-module dependencies detected.

CLI Reference

No CLI commands found in this project.

Module Overview

Detailed documentation for each module in the project:

Device resolution (`device.ipynb`)

Resolve a device spec (“auto” / “cpu” / “cuda” / “cuda:N”) to a concrete torch device string.

Import

from cjm_substrate_torch_utils.device import (
    resolve_torch_device
)

Functions

def resolve_torch_device(
    spec: str = "auto",  # Requested device: "auto", "cpu", "cuda", or "cuda:N"
) -> str:                # Concrete device string
    """
    Resolve a device spec to a concrete torch device string.
    
    `"auto"` resolves to `"cuda"` when CUDA is available, else `"cpu"`. Any
    explicit spec (`"cpu"`, `"cuda"`, `"cuda:0"`, ...) is returned unchanged.
    """

GPU model release (`memory.ipynb`)

Robust move-to-CPU + drop-references + gc + CUDA-cache cleanup for releasing models, factored out of the per-capability reimplementations.

Import

from cjm_substrate_torch_utils.memory import (
    release_model
)

Functions

def release_model(
    obj: Any,                     # The capability instance holding the model attribute(s)
    model_attr_names: List[str],  # Names of the attributes to release, in release order
    device: str = "cuda",         # Device the model is on; gates the CUDA-specific cleanup
    *,
    logger: logging.Logger,       # Logger for best-effort failure reporting
) -> None
    """
    Release one or more model objects: move to CPU, drop references, gc, free CUDA cache.
    
    For each name in `model_attr_names`, if `obj` has a non-None attribute:
      1. when on CUDA, best-effort `.to('cpu')` (frees GPU tensors; skipped for
         objects without a `.to` method, e.g. processors/tokenizers),
      2. `setattr(obj, name, None)` and drop the local reference.
    Then a single `gc.collect()` and — on CUDA — `empty_cache()` + `synchronize()`.
    
    Best-effort throughout: failures are logged and swallowed. Missing or
    already-None attributes are skipped, so the call is idempotent.
    """

CUDA OOM handling (`oom.ipynb`)

Convert torch CUDA out-of-memory exceptions into the substrate’s typed CapabilityResourceError (SG-47 Track B) so CR-7 reactive retry can evict and reload.

Import

from cjm_substrate_torch_utils.oom import (
    cuda_oom_to_capability_resource_error
)

Functions

def cuda_oom_to_capability_resource_error(
    exc: BaseException,          # The caught CUDA OOM exception (e.g. torch.cuda.OutOfMemoryError)
    *,
    label: str,                  # Context for the message, e.g. "loading model 'X'" or "inference"
    headroom_mb: float = 100.0,  # Best-effort margin added to `available` to estimate `needed`
) -> CapabilityResourceError:        # Typed error for the substrate's CR-7 reactive-retry path
    """
    Convert a CUDA out-of-memory exception into a substrate-typed `CapabilityResourceError`.
    
    SG-47 Track B: a capability's GPU inference / model-load site catches
    `torch.cuda.OutOfMemoryError` and re-raises the result of this helper so the
    substrate sees a typed resource error (evict + reload + retry via CR-7)
    instead of an opaque crash.
    
    `needed` is a best-effort estimate (`available + headroom_mb`): the true
    required VRAM is unknowable from the exception, and CR-7 triggers eviction
    regardless of magnitude, so an approximation above `available` is sufficient.
    
    The caller raises the returned error, preserving the original cause:
    
        try:
            model = Model.from_pretrained(repo_id, ...)
        except torch.cuda.OutOfMemoryError as e:
            raise cuda_oom_to_capability_resource_error(e, label=f"loading {repo_id!r}") from e
    """

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.12

Jun 21, 2026

0.0.11

Jun 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cjm_substrate_torch_utils-0.0.12.tar.gz (9.2 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cjm_substrate_torch_utils-0.0.12-py3-none-any.whl (11.7 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file cjm_substrate_torch_utils-0.0.12.tar.gz.

File metadata

Download URL: cjm_substrate_torch_utils-0.0.12.tar.gz
Upload date: Jun 21, 2026
Size: 9.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for cjm_substrate_torch_utils-0.0.12.tar.gz
Algorithm	Hash digest
SHA256	`20a020cb550e65c7007f511651318c52f8a2b411c32b9b178bd3569f7984050e`
MD5	`8686fef90bc254077e4aae280243def8`
BLAKE2b-256	`2f50916d9879c393bbba23390956de13539b9156ae91cd00ffbd1125e067cbc5`

See more details on using hashes here.

File details

Details for the file cjm_substrate_torch_utils-0.0.12-py3-none-any.whl.

File metadata

Download URL: cjm_substrate_torch_utils-0.0.12-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 11.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for cjm_substrate_torch_utils-0.0.12-py3-none-any.whl
Algorithm	Hash digest
SHA256	`70d6435855ecc4bc3fe7232c10d5d2c45eee2c64c6f871fc2dc9ab872086f339`
MD5	`7ed0fbd896459a53541ce112716ebf67`
BLAKE2b-256	`95ce4f04ce79f53a4d22341091440d4841816918d0718cbae233e338802a68fb`

See more details on using hashes here.

cjm-substrate-torch-utils 0.0.12

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cjm-substrate-torch-utils

Install

Project Structure

Module Dependencies

CLI Reference

Module Overview

Device resolution (device.ipynb)

Import

Functions

GPU model release (memory.ipynb)

Import

Functions

CUDA OOM handling (oom.ipynb)

Import

Functions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Device resolution (`device.ipynb`)

GPU model release (`memory.ipynb`)

CUDA OOM handling (`oom.ipynb`)